error in bcbio structural variant calling #653

shang-qian · 2014-10-28T01:51:57Z

Hi Brad,

Thanks for your help. I want to call structural variants, but get an error: the parallel, svtyper, cnvnator_wrapper.py, cnvnator-multi, annotate_rd.py are not found in PATH, like this:

[2014-10-27 23:05] Uncaught exception occurred
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 20, in run
_do_run(cmd, checks, log_stdout)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 93, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; speedseq sv -v -B ......
Sourcing executables from /public/software/bcbio-nextgen/tools/bin/speedseq.config ...
which: no parallel in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio-nextgen/anaconda/bin:.....)
which: no svtyper in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio....
which: no cnvnator_wrapper.py in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio....
which: no cnvnator-multi in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio-....
which: no annotate_rd.py in((/public/software/bcbio-nextgen/tools/bin:/....)
Calculating alignment stats... sambamba-view: (Broken pipe)
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/tools/share/lumpy-sv/pairend_distro.py", line 12, in
import numpy as np
ImportError: No module named numpy

How can I fix this, thanks again.

Shangqian

chapmanb · 2014-10-28T12:45:16Z

Shangqian;
Thanks for the report and apologies about the issue. The problem was that speedseq, which wraps the lumpy calling, calls out to a lumpy python script that requires numpy. If your system python does not have numpy installed, it results in this error. The other messages about svtyper and cnvnator are not a problem as we don't use those within bcbio.

I pushed a fix which resolves the issue by ensuring we use the Python installed with bcbio, which does contain numpy. If you upgrade with:

bcbio_nextgen.py upgrade -u development

it will grab the latest code and should work cleanly now. Thanks again.

shang-qian · 2014-10-31T06:35:09Z

Hi Brad,
Thanks so much, and It works well in genome_sv. There is another little question that I am not sure:
In my data analysis, I had called the vcfs from a family used one caller (gatk-hc). Now I want to use the three callers and ensemble the result vcf. The following is my yaml file. Is that right? Thanks.
details:

files: [../input/sample08.bam]
description: sample08
metadata:
batch: ceph
sex: male
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
align_split_size: 5000000
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: [freebayes,gatk-haplotype]
quality_format: Standard
coverage_interval: regional
validate: ../input/GiaB_NIST_RTG_v0_2.vcf.gz
validate_regions: ../input/GiaB_NIST_RTG_v0_2_regions.bed
variant_regions: ../input/NGv3.bed
ensemble:
format-filters: [DP < 4]
classifiers:
balance: [AD, FS, Entropy]
calling: [ReadPosEndDist, PL, PLratio, Entropy, NBQ]
classifier-params:
type: svm
trusted-pct: 0.65
files: [../input/sample09.bam]
description: sample09
metadata:
batch: ceph
sex: female
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
align_split_size: 5000000
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: [freebayes,gatk-haplotype]
quality_format: Standard
coverage_interval: regional
validate: ../input/GiaB_NIST_RTG_v0_2.vcf.gz
validate_regions: ../input/GiaB_NIST_RTG_v0_2_regions.bed
variant_regions: ../input/NGv3.bed
ensemble:
format-filters: [DP < 4]
classifiers:
balance: [AD, FS, Entropy]
calling: [ReadPosEndDist, PL, PLratio, Entropy, NBQ]
classifier-params:
type: svm
trusted-pct: 0.65
files: [../input/sample10.bam]
description: sample10
metadata:
batch: ceph
sex: male
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
align_split_size: 5000000
mark_duplicates: true
recalibrate: false
realign: false
variantcaller: [freebayes,gatk-haplotype]
quality_format: Standard
coverage_interval: regional
remove_lcr: true
validate: ../input/GiaB_NIST_RTG_v0_2.vcf.gz
validate_regions: ../input/GiaB_NIST_RTG_v0_2_regions.bed
variant_regions: ../input/NGv3.bed
ensemble:
format-filters: [DP < 4]
classifiers:
balance: [AD, FS, Entropy]
calling: [ReadPosEndDist, PL, PLratio, Entropy, NBQ]
classifier-params:
type: svm
trusted-pct: 0.65

chapmanb · 2014-10-31T18:00:47Z

Shangqian;
That generally looks good, although you only have 2 variant callers listed. You'll want to have 3 or more to get good results from ensemble calling: samtools and platypus are two other good choices. Glad the fix worked for you.

shang-qian · 2014-11-01T01:38:15Z

sorry for my typing mistake in three caller :). Thanks again for your helpful suggestion and contribution. The bcbio is great and useful for me.

shang-qian · 2014-11-06T10:16:02Z

Hi Brad,

when I run above code, there exists following error:
Exception in thread "main" java.lang.Exception: VCF files do not have consistent headers: ["ceph-gatk-haplotype.vcf.gz" "ceph-samtools.vcf.gz"]

I know the problem is in VCF file, so I open the two VCF files and find the header sample names are different: the order in gatk-hc is sample10/sample8/sample9, but sample8/sample10/sample9 in samtols. finally, this problem is solved since I correct the same order. However, I don't think this is a good way to manually modify every time.

So, is there an automatic way for the same header by just modifying the input yaml file or bcbio-nextgen.
Thanks.

kind regards,
Shangqian

…with identical file names. Thanks to Severine Catreux. Resort multisample inputs that have inconsistent sample orders. Fixes bcbio/bcbio-nextgen#653

chapmanb · 2014-11-07T02:22:34Z

Shangqian;
Sorry about the issue. bcbio.variation did not explicitly sort input VCFs which can cause issues with different callers that insist on sorting in specific ways. I pushed a fix which should handle resorting these to a consistent order prior to doing ensemble calling. If you upgrade your tools with:

bcbio_nextgen.py upgrade --tools

and re-run it should hopefully work cleanly now. Thanks again for the reports.

shang-qian · 2014-11-10T10:18:36Z

Hi Brad,

Thanks for your reponse. I had updated the bcbio-nextgen. Thanks a lot.
Besides, my log file out from the cancer yaml showed the memory did not enough for gatk. But this issue didn't exist in the exome pipeline. So can you help me to fix this. The following is the error log content:

[2014-11-10 17:06] ##### ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
[2014-11-10 17:06] ##### ERROR ------------------------------------------------------------------------------------------
[2014-11-10 17:06] Uncaught exception occurred
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/gatk-framework -Xms166m -Xmx1166m -XX:+UseSerialGC -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L 9:96714156-127734373 -R /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -I /public/users/xieshangqian/project/LungC/bcbio/work/align/syn3-tumor/2_2014-11-03_dream-syn3-sort.bam --downsample_to_coverage 10000 -BQSR /public/users/xieshangqian/project/LungC/bcbio/work/align/syn3-tumor/2_2014-11-03_dream-syn3-sort.grp -o /public/users/xieshangqian/project/LungC/bcbio/work/bamprep/syn3-tumor/9/tx/tmpRv7YoC/2_2014-11-03_dream-syn3-sort-9_96714155_127734373-prep-prealign.bam

Kind regards,
Shangqian

chapmanb · 2014-11-10T20:41:03Z

Shangqian;
It looks like you need to allocate additional memory to GATK in your /public/software/bcbio-nextgen/galaxy/bcbio_system.yaml file, specifically increasing the -Xmx value under gatk. The cancer dataset is high depth (100x) and it looks like GATK needs additional memory to run effectively. Hope this helps.

shang-qian · 2014-11-20T12:11:12Z

Thank you, Brad,.The cancer pipeline was done well. Many thanks for your big help every time.

By the way, does the bcbio require the same length of paired-end Read1 and Read2 for bwa-men alignment? Because the different length Read1 and Read 2 that were trimed by trimmomatic showed the error: "paired reads have different names". In my view the bwa-men may be normal for different length read alignment. So, is there some special setting or some parameters than I didn't mention in bcbio. Thanks again. :)

chapmanb · 2014-11-20T14:53:31Z

Shangqian;
Glad that the cancer calling finished without any problems. bcbio/bwa-mem do not require reads to be the same length, but do require that all reads are paired. How did you run trimmomatic? The best approach is to use the paired end (PE) mode and feed the paired output into bcbio:

http://www.usadellab.org/cms/index.php?page=trimmomatic

It sounds like you may have trimmed separately or added the unpaired reads in which creates non-identical pair names in your fastq files. Hope this helps.

shang-qian · 2014-11-21T06:36:50Z

Hi Brad,
Thanks so much for your reponse. I had used the NA12891 data for testing , and the bcbio/bwa-men is ok. So that I am uncertain where is the problem happened now. My test step was that:
I input the NA12891.R1 and R2 fastq file ,the error also existed, the following is the error messages:

[2014-11-20 18:16] [mem_sam_pe] paired reads have different names: "FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF######", "AF@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC"
[2014-11-20 18:16] samblaster: Loaded 84 header sequence entries.
[2014-11-20 18:16] Uncaught exception occurred
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/bwa mem -M -t 16 -R '@rg\tID:1\tPL:illumina\tPU:s1\tSM:s1' -v 1 /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(/public/software/bcbio-nextgen/tools/bin/grabix grab /public/users/xieshangqian/project/NAR/bcbio/work/align_prep/NA12891.R1.fastq.gz 20000000 39999999) <(/public/software/bcbio-nextgen/tools/bin/grabix grab /public/users/xieshangqian/project/NAR/bcbio/work/align_prep/NA12891.R2.fastq.gz 20000000 39999999) | /public/software/bcbio-nextgen/tools/bin/samblaster --splitterFile >(/public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/spl -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmp4ArZBx/s1-sort-20000000_39999999-sr.bam /dev/stdin) --discordantFile >(/public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/disc -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmpjoA75N/s1-sort-20000000_39999999-disc.bam /dev/stdin) | /public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/full -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmpZuouw8/s1-sort-20000000_39999999.bam /dev/stdin

I think the error is because of the paired reads have different names, so I grep the "FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF######" and "AF@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC" that are both in line 20000000 from NA12891 R1 and R2 fastq file. the result are :
R1 read: line 19999997-20000000
@206B4ABXX100825:6:61:6782:130154/1
AAATCTCACCACTTAACCCATACCAGACCAGACCCAAAAGGAAAGGCCGGGTTCAGTAACAACAACCTGGGTTCAA
+
DEFDIGHEAHDGFCCGGHHECAGHEFECH=HD>FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF######
R2 read: line 19999997-20000000
@206B4ABXX100825:6:61:6782:130154/2
TTGTAGGGGTGTGATGCCGTGGACCCCTTCTTGAACCCCCAAGCTCGTCTTGCATTTGGGGCTCTAGCATGCAGCT
+
@af@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC

The result showed the same length of R1 and R2 sequence also had error. So I think may be is the data problem. Then I awk the 1M fastq from line 19999997-20999996 as the test NA12891_test R1 and R2 file.

When I run the test file that just line 19999997-20999996 from original files and also include the @206B4ABXX100825:6:61:6782:130154/1 and /2 reads. There is normal and work well without any error.

So I am uncertain where is the problem. Any advice woulde be appreciated for me. Thanks again.

Shangqian

chapmanb · 2014-11-21T11:53:13Z

Shangqian;
Thanks for including the full traceback, that is very helpful. This is due to a change in grabix, the tool we use for indexing fastq files when running in sections. You may have updated the code or tools separately, and this fix requires a simultaneous update. You can either fix by removing align_split_size and running individually, or getting the latest code and tools:

bcbio_nextgen.py upgrade -u development --tools

You may need to also remove alignprep/*.gbi to force the creation of new indexes. Hope this fixes the issue for you.

shang-qian · 2014-12-01T09:21:50Z

Brad,
Many thanks for your former detail advice, the whole exome and genome are working in our HPC now. there are also two questions that need your help:

There are 32 samples in my exome datasets. and I want to run bcbio with crossing multi nodes. In the documents 0.8.2 "bcbio_nextgen.py bcbio_sample.yaml -t ipython -n 12 -s lsf -q queue" can fix this problem. But I have a little confusion about the parameter -s and -q. Should I need to chang the lsf and queue in our cluster computer or just keep the default is ok?
RNA pipeline error: "[2014-11-28 21:23] ../rnaseq/ref-transcripts.dexseq.gff3 was not found, so exon-level counting is being skipped."
In the ../rnaseq/ folder, it just exists the ref-transcripts.dexseq.gff file , so how can I fix this problem. Does link the *.gff3 file to *.gff use code "ln -s *.gff *.gff3" right?

my yaml file is :
details:

algorithm:
adapters:
truseq
polya
aligner: tophat2
quality_format: Standard
strandedness: unstranded
trim_reads: read_through
analysis: RNA-seq
description: Test_rep2
files:
- /public/users/zhusimin/Xie_project/Xiang_RNA/raw/fastq_raw/Ptf1aKO/ptf1amut1_R1.fastq
- /public/users/zhusimin/Xie_project/Xiang_RNA/raw/fastq_raw/Ptf1aKO/ptf1amut1_R2.fastq
genome_build: mm10

Thanks again for your helpful advice.

Best,
Shangqian

roryk · 2014-12-01T10:57:17Z

Hi Shangqian,

Sorry about the DEXSeq issue; linking will fix it, our pre-built indices have the wrong extension.

For the scheduler and queue, on your HPC, is there a job scheduler that you submit your jobs to that distributes the jobs over the nodes? There are a bunch of different types of scheduler, LSF is one, there are others like SLURM and SGE. If you can find out what scheduler your HPC has running then you put that as the scheduler, and then the queue you are allowed to submit jobs to as the queue.

If you explicitly set a get_x or set_x function, it will not be overwritten. This lets special complicated cases get handled inside there without changing the function signature everywhere. Fix to allow for .gff and .gff3 extensions for the DEXseq file. Addresses an issue raised in #653.

roryk · 2014-12-01T15:29:59Z

Shangqian,

I fixed this DEXSeq behavior so now it will find either .dexseq.gff or .dexseq.gff3 files here: 0e9c746.

shang-qian · 2014-12-02T02:44:44Z

Hi Roryk,

Thanks for your promptly response, It helps me so much. Thanks a lot.

best,
Shangqian

shang-qian · 2014-12-02T03:21:55Z

RoryK,

The gff problem had been fixed,but the other issue existed. the error showed:
[2014-12-02 11:12] multiprocessing: generate_transcript_counts
Error in find.package("DEXSeq") : there is no package called 'DEXSeq'

So how can I to install DEXSeq packages in bcbio, Thanks.

roryk · 2014-12-02T03:35:14Z

Hi Shangqian,

Hm-- it should be getting installed automatically. If you fire up R and do:

source("http://bioconductor.org/biocLite.R")
biocLite("DEXSeq")

it should install it.

shang-qian · 2014-12-02T15:50:37Z

Hi Roryk,

I install DEXSeq in the node with R. But when I run bcbio, it also can't find this packages, I think it maybe package DEXSeq was not add into the bcbio running.
Besides, I found the DEXSeq package is in the ./tools/lib/R/site-library, So I think I can use this packages by set R_LIBRARY_PATH. But it also existed the same error.
Can you show me how to add the DEXSeq package to R under the bcbio envirornment. Thanks.

Shangqian

roryk · 2014-12-02T16:02:34Z

Hi Shangqian,

I agree, that seems like it should work, thanks for helping to debug this. Hmm-- if you type:

Rscript -e 'find.package("DEXSeq")'

Does it output a directory or say the package cannot be found? If it works, does it work also with R_LIBRARY_PATH unset?

…ndle GFF retrieval when no GTF file in DEXSeq unit tests. Fixes #653

chapmanb · 2014-12-02T16:51:26Z

Shangqian;
Apologies, @roryk and I traced this back to bcbio not injecting the installed site-libraries for R into the search path when looking for DEXSeq. I pushed a fix which does this, so if you upgrade to the latest development version:

bcbio_nextgen.py upgrade -u development

it should hopefully work cleanly now. Thanks for the bug report and hope this fixes it for you.

shang-qian · 2014-12-03T01:24:04Z

Brad and Roryk,
Thanks for the fix. I am upgrading the bcbio now.

By the way, in former test of whole genome SV, the bcbio was normal. But five days ago, I submitted a true lung cancer data to analysis SV, and the error happened today morning :

[2014-12-03 09:01] Index BAM file: 1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam
[2014-12-03 09:01] Samtools-htslib-API: bam_index_build2() not yet implemented
[2014-12-03 09:01] /bin/bash: line 1: 26699 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai
[2014-12-03 09:01] Index BAM file (single core): 1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam
[2014-12-03 09:01] Samtools-htslib-API: bam_index_build2() not yet implemented
[2014-12-03 09:01] /bin/bash: line 1: 26702 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai
[2014-12-03 09:01] Uncaught exception occurred
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai
Samtools-htslib-API: bam_index_build2() not yet implemented
/bin/bash: line 1: 26702 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai
' returned non-zero exit status 134

I can't find the cause for this error, because the same command was normal for another bam file. Can you help me. Thanks.

Shangqian

…ut bai file) since not supported in samtools 1.1 (samtools/samtools#199). Fixes #653

chapmanb · 2014-12-03T02:16:49Z

Shangqian;
Thanks for the report. The new version of samtools index does not support specifying the output of the .bam.bai file, which triggered this error. I'm confused as to why the code used samtools for indexing since it should use sambamba index by default, but perhaps there is something problematic about your sambamba install. Either way, I pushed a small fix to work around this issue so if you update it should hopefully work cleanly now. Thanks again.

shang-qian · 2014-12-03T02:29:11Z

Brad,
I am upgrading bcbio, but it runs the
[localhost] local: /public/software/bcbio-nextgen/tools/bin/brew info speedseq
exits error:
Fatal error: local() encountered an error (return code 1) while executing '/public/software/bcbio-nextgen/tools/bin/brew info speedseq'

I run 5 times ,every times has the same error.Can you check this?
Thanks.

chapmanb · 2014-12-03T10:38:38Z

Shangqian;
Sorry about the problem. I'm not sure why that command would fail. Does it provide any useful error messages if you run it outside of the upgrade process?

/public/software/bcbio-nextgen/tools/bin/brew info speedseq

shang-qian · 2014-12-03T11:00:28Z

Hi Brad,
it takes the following messages:
[root@compute-0-15 bin]# /public/software/bcbio-nextgen/tools/bin/brew info speedseq
speedseq: stable 2014-08-22
https://github.com/cc2qe/speedseq
/public/software/bcbio-nextgen/tools/Cellar/speedseq/2014-08-22 (4 files, 92K) *
Built from source
From: https://github.com/chapmanb/homebrew-cbl/blob/master/speedseq.rb
==> Dependencies
Error: No available formula for sambamba

Is this problem caused by sambamba package, How can I fix this.

chapmanb · 2014-12-03T19:06:43Z

Shangqian;
That's strange, it seems like your recipes are not getting updated since sambamba should be present in homebrew-science. This should happen automatically but you can run:

/public/software/bcbio-nextgen/tools/bin/brew update

which should pull it in. Hope this helps.

shang-qian · 2014-12-04T00:54:34Z

Bran,
Thanks for your response, when I run the the relation code, it yields below error:

[root@compute-0-15 bin]# /public/software/bcbio-nextgen/tools/bin/brew update
Unpacking objects: 100% (12/12), done.
error: Your local changes to 'bedtools.rb' would be overwritten by merge. Aborting.
Please, commit your changes or stash them before you can merge.
Error: Failed to update tap: homebrew/science
Already up-to-date.

should I rm homebrew/science and re-upgrade？

chapmanb · 2014-12-04T01:53:26Z

Shangqian;
I'm not sure how the bedtools formula got changed manually but that explains the issues. You can fix with:

cd /public/software/bcbio-nextgen/tools/Library/Taps/homebrew/homebrew-science
git checkout bedtools.rb

then you should be able to re-run the updater and find everything working. Hope this helps figure it out.

shang-qian · 2014-12-08T01:04:58Z

Brad,

Thanks for your advice, I had upgraded the bcbio, and the dexseq is working now, but I test the exome pipeline ,under the command:
[2014-12-07 20:54] java -Xms750m -Xmx2500m -Djava.io.tmpdir=/public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/tmp -jar /public/software/bcbio-nextgen/tools/share/java/bcbio_variation/bcbio.variation-0.1.9-standalone.jar variant-ensemble /public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/config/test-ensemble.yaml /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa /public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/test-ensemble.vcf /public/users/xieshangqian/Testcode/testdata/bcbio/work/gatk-haplotype/test-effects-ploidyfix-combined-gatkclean.vcf.gz /public/users/xieshangqian/Testcode/testdata/bcbio/work/freebayes/test-effects-ploidyfix-filter.vcf.gz /public/users/xieshangqian/Testcode/testdata/bcbio/work/samtools/test-effects-ploidyfix-filter.vcf.gz

it yields the following info:
[2014-12-07 20:59] Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
[2014-12-07 20:59] at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsInt(CommonInfo.java:242)
[2014-12-07 20:59] at htsjdk.variant.variantcontext.VariantContext.getAttributeAsInt(VariantContext.java:703)
[2014-12-07 20:59] at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:946)
[2014-12-07 20:59] at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:309)
[2014-12-07 20:59] at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:117)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
[2014-12-07 20:59] at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
[2014-12-07 20:59] at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
[2014-12-07 20:59] at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
[2014-12-07 20:59] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
[2014-12-07 20:59] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
[2014-12-07 20:59] at bcbio.run.broad$run_gatk$fn__1805.invoke(broad.clj:34)
[2014-12-07 20:59] at bcbio.run.broad$run_gatk.invoke(broad.clj:31)
[2014-12-07 20:59] at bcbio.variation.combine$combine_variants.doInvoke(combine.clj:71)
[2014-12-07 20:59] at clojure.lang.RestFn.invoke(RestFn.java:1557)
[2014-12-07 20:59] at bcbio.variation.recall$get_min_merged.invoke(recall.clj:158)
[2014-12-07 20:59] at bcbio.variation.recall$fn__7040.invoke(recall.clj:173)
[2014-12-07 20:59] at clojure.lang.MultiFn.invoke(MultiFn.java:249)
[2014-12-07 20:59] at bcbio.variation.recall$create_merged$fn__7045.invoke(recall.clj:187)
[2014-12-07 20:59] at clojure.core$map$fn__4207.invoke(core.clj:2487)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4214.invoke(core.clj:2496)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4207.invoke(core.clj:2479)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4211.invoke(core.clj:2490)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4207.invoke(core.clj:2479)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4214.invoke(core.clj:2496)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$map$fn__4207.invoke(core.clj:2479)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$reduce1.invoke(core.clj:890)
[2014-12-07 20:59] at clojure.core$reverse.invoke(core.clj:904)
[2014-12-07 20:59] at clojure.math.combinatorics$combinations.invoke(combinatorics.clj:73)
[2014-12-07 20:59] at bcbio.variation.compare$variant_comparison_from_config$iter__7582__7586$fn__7587.invoke(compare.clj:255)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484)
[2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133)
[2014-12-07 20:59] at clojure.core$tree_seq$walk__4647$fn__4648.invoke(core.clj:4475)
[2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42)
[2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60)
[2014-12-07 20:59] at clojure.lang.LazySeq.more(LazySeq.java:96)
[2014-12-07 20:59] at clojure.lang.RT.more(RT.java:607)
[2014-12-07 20:59] at clojure.core$rest.invoke(core.clj:73)
[2014-12-07 20:59] at clojure.core$flatten.invoke(core.clj:6478)
[2014-12-07 20:59] at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254)
[2014-12-07 20:59] at bcbio.variation.ensemble$consensus_calls.invoke(ensemble.clj:113)
[2014-12-07 20:59] at bcbio.variation.ensemble$_main.doInvoke(ensemble.clj:133)
[2014-12-07 20:59] at clojure.lang.RestFn.applyTo(RestFn.java:137)
[2014-12-07 20:59] at clojure.core$apply.invoke(core.clj:617)
[2014-12-07 20:59] at bcbio.variation.core$_main.doInvoke(core.clj:35)
[2014-12-07 20:59] at clojure.lang.RestFn.applyTo(RestFn.java:137)
[2014-12-07 20:59] at bcbio.variation.core.main(Unknown Source)

and at least one day to do this command (now is still running this command).
Before upgrading, I know it didn't need many time like this.so I think it maybe need fix. Is this command normal.
Thanks again.

Shangqian

chapmanb · 2014-12-08T14:43:29Z

Shangqian;
Sorry about the issue. This is from a problem with vcfallelicprimitives and multi-allele sites and was recently fixed in the development version. See more details here:

#679 (comment)

If you upgrade with bcbio_nextgen.py upgrade -u development, remove the freebayes and checkpoints directories (rm -rf freebayes && rm -rf checkpoints_parallel), and re-run it should hopefully work cleanly. Hope this fixes it for you.

shang-qian · 2014-12-10T01:42:32Z

Thanks,Brad, It had upgraded and solved the problem. Thanks again.

shang-qian · 2014-12-10T06:54:02Z

Brad and Roryk,
Many thanks for your big help.

Our HPC has 32 nodes(each node has 20 cores) and PBS scheduler is used for submitting mission. I formerly submitted a pbs file to run bcbio just under one node and it worked well. But now I need to analyse many samples in bcbio. So I think I can use the parallel for crossing multi nodes just like Roryk had told me. The following is my test pbs file for node13 and node17:

#PBS -N exome_s10
#PBS -j oe
#PBS -l nodes=c13:ppn=3+c17:ppn=5
#PBS -l walltime=5000:00:00
#PBS -q high
cd ~/Testcode/testdata/bcbio/work/
bcbio_nextgen.py ../config/test_exome_single.yaml -t ipython -n 8 -s torque -q high

when I qsub this file, it yields error like this:

[2014-12-10 11:45] compute-0-13.local: Resource requests: bwa, sambamba, samtools; memory: 2.0; cores: 16, 1, 16
[2014-12-10 11:45] compute-0-13.local: Configuring 1 jobs to run, using 8 cores each with 16.2g of memory reserved for each job
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_notebook_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_nbconvert_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcontroller_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipengine_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcluster_config.py'
[ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/iplogger_config.py'
2014-12-10 11:45:36.491 [IPClusterStart] Config changed:
2014-12-10 11:45:36.491 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher': {'mem': '16.2', 'cores': 8, 'tag': '', 'resources': ''}, 'IPClusterEngines': {'early_shutdown': 240}, 'Application': {'log_level': 10}, 'ProfileDir': {'location': u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'}, 'BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'e1bf1e39-9d63-4884-ba38-345be349dbd2'}, 'TORQUELauncher': {'queue': 'high'}, 'BcbioTORQUEControllerLauncher': {'mem': '16.2', 'cores': 2, 'tag': '', 'resources': ''}, 'IPClusterStart': {'delay': 10, 'n': 1, 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher'}}
2014-12-10 11:45:36.503 [IPClusterStart] Using existing profile dir: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'
2014-12-10 11:45:36.504 [IPClusterStart] Searching path [u'/public/users/xieshangqian/Testcode/testdata/bcbio/work', u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'] for config files
2014-12-10 11:45:36.504 [IPClusterStart] Attempting to load config file: ipython_config.py
2014-12-10 11:45:36.505 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_config.py
2014-12-10 11:45:36.506 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py
2014-12-10 11:45:36.507 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcontroller_config.py
2014-12-10 11:45:36.507 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py
2014-12-10 11:45:36.508 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipengine_config.py
2014-12-10 11:45:36.509 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py
2014-12-10 11:45:36.510 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcluster_config.py
2014-12-10 12:01:09.032 [IPClusterStop] Using existing profile dir: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'
2014-12-10 12:01:09.094 [IPClusterStop] Stopping cluster [pid=21885] with [signal=2]
Traceback (most recent call last):
File "/public/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 216, in
main(**kwargs)
File "/public/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 42, in main
run_main(**kwargs)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 45, in run_main
fc_dir, run_info_yaml)
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 81, in _run_toplevel
for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples):
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 140, in run
multiplier=alignprep.parallel_multiplier(samples)) as run_parallel:
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/prun.py", line 53, in start
with ipython.create(parallel, dirs, config) as view:
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/contextlib.py", line 17, in enter
return self.gen.next()
File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 913, in cluster_view
raise IOError("Cluster startup timed out.")
IOError: Cluster startup timed out.

So, I have a little confusion, is my understand your advice right? Should I need to modify my PBS file or the bcbio system. By the way, the mission just works on node13 but not in node17 after I qsub pbs file. Thanks again for your kindly response.

Shangqian

roryk · 2014-12-10T07:02:13Z

Hi Shanqian,

Is your HPC busy? If you have to wait a long time to get a job, bcbio-nextgen will time out. When you tried to submit it, were the jobs pending for a long time or did they move to running status? If they are pending and bcbio-nextgen is timing out, you can have bcbio-nextgen wait for longer by adding --timeout time-in-minutes to the bcbio-nextgen command, so it won't time out while it is waiting. Hope that helps, let us know how it goes.

Best,

Rory

shang-qian · 2014-12-10T07:09:56Z

Hi Rory,
I had checked our HPC nodes, they were all idle before submitting my job.
Ok, I try to add the --timeout command. And if there is any problem, I also still need your help :) , Thanks.
Best,
Shangqian

roryk · 2014-12-10T07:16:43Z

Hi Shangqian,

If the nodes were idle then it might be an issue running on Torque. When you submit the job does everything get to the running state and it still times out, or are the jobs pending? If the jobs are in the running state but it still times out that would be very helpful to know.

shang-qian · 2014-12-10T07:25:55Z

Hi Rory,
Jobs are in running state, and the same error happened.

roryk · 2014-12-10T07:30:05Z

Great, when the jobs are in the running state, is there a controller job and an engine job both running too? There should be three jobs running, one that is the bcbio_nextgen job in the script you wrote to submit to the scheduler. The other two should be a controller and a set of engines. Were all of those running, or just the one bcbio_nextgen job?

If the controller and engine jobs were running too, are there files that have engine and ipcontroller in them that are in your directory? I think if you look at those, you should see some errors talking about heartbeats between the engine and controller. Do you see something like that?

shang-qian · 2014-12-10T07:39:38Z

Thanks, Rory
this is the qstat result:
Job id Name User Time Use S Queue
11063.cluster exome_s10 xieshangqian 00:00:05 R high

and the top shows the "bcbio-nextgene.py" is runnning.
How should I to check whether the controller or engines jobs is running or not.

roryk · 2014-12-10T07:42:50Z

They should be appearing on there if they are running, so it seems like they aren't starting. Are there engine and ipcontroller files in the directory? There should be job submission scripts for each of them.

shang-qian · 2014-12-10T07:57:56Z

yes, there are in the lib and pkgs folders
/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/sqlalchemy/engine
/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/IPython/parallel/engine
/public/software/bcbio-nextgen/anaconda/pkgs/sqlalchemy-0.9.7-py27_0/lib/python2.7/site-packages/sqlalchemy/engine
/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.1-py27_0/lib/python2.7/site-packages/IPython/parallel/engine
/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.2.0-py27_0/lib/python2.7/site-packages/IPython/parallel/engine
/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.0-py27_0/lib/python2.7/site-packages/IPython/parallel/engine

/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.1-py27_0/bin/ipcontroller
/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.2.0-py27_0/bin/ipcontroller
/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.0-py27_0/bin/ipcontroller
/public/software/bcbio-nextgen/anaconda/bin/ipcontroller

roryk · 2014-12-10T08:04:34Z

Hm-- nothing in the work directory? They should look like torque_controller with a bunch of letters and numbers after them. If we can track down those files we can try to figure out why they aren't getting run.

shang-qian · 2014-12-10T08:18:54Z

ye, I had found them in the work directory. The contents are:
Controller:
#!/bin/sh
#PBS -q high
#PBS -V
#PBS -N bcbio-c
#PBS -j oe
#PBS -l nodes=1:ppn=2
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from cluster_helper.cluster import VMFixIPControllerApp; VMFixIPControllerApp.launch_instance()' --ip=* --log-to-file --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669" --nodb --hwm=1 --scheme=leastload --HeartMonitor.max_heartmonitor_misses=120 --HeartMonitor.period=60000

Engines:
#!/bin/sh
#PBS -q high
#PBS -V
#PBS -j oe
#PBS -N bcbio-e
#PBS -t 1-1
#PBS -l nodes=1:ppn=5
#PBS -l mem=10444mb
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=960 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=120 --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669"

chapmanb · 2014-12-10T12:01:06Z

Shangqian;
Thanks for the help debugging. If you manually submit one of these:

qsub torque_controller*

does it provide any useful error messages? It sounds like something with the submission is problematic with your setup and maybe this will provide a clue. Thanks again.

lpantano · 2014-12-10T14:34:53Z

Hi,

I would try to submit one of this files. Normally they don't start
because something is wrong with these files due to any configuration. If
you submit one of this files alone, you will check if there is any error
with them that we didn't think of.

It happened to me in one queue where you should submit a job with more
than two cores, for instance. So the cluster manager will not get those
jobs enter the queue and bcbio gets stuck.

On 12/10/2014 03:18 AM, shang-qian wrote:

ye, I had found them in the work directory. The contents are:
Controller:
#!/bin/sh
#PBS -q high
#PBS -V
#PBS -N bcbio-c
#PBS -j oe
#PBS -l nodes=1:ppn=2
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import
resource; cur_proc, max_proc =
resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc,
10240) if max_proc > 0 else 10240;
resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc),
max_proc)); cur_hdls, max_hdls =
resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls =
min(max_hdls, 10240) if max_hdls > 0 else 10240;
resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls,
target_hdls), max_hdls)); from cluster_helper.cluster import
VMFixIPControllerApp; VMFixIPControllerApp.launch_instance()' --ip=*
--log-to-file
--profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython"
--cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669" --nodb --hwm=1
--scheme=leastload --HeartMonitor.max_heartmonitor_misses=120
--HeartMonitor.period=60000

Engines:
#!/bin/sh
#PBS -q high
#PBS -V
#PBS -j oe
#PBS -N bcbio-e
#PBS -t 1-1
#PBS -l nodes=1:ppn=5
#PBS -l mem=10444mb
#PBS -l walltime=239:00:00
cd $PBS_O_WORKDIR
/public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import
resource; cur_proc, max_proc =
resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc,
10240) if max_proc > 0 else 10240;
resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc),
max_proc)); cur_hdls, max_hdls =
resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls =
min(max_hdls, 10240) if max_hdls > 0 else 10240;
resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls,
target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp
import launch_new_instance; launch_new_instance()' --timeout=960
--IPEngineApp.wait_for_url_file=960
--EngineFactory.max_heartbeat_misses=120
--profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython"
--cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669"

—
Reply to this email directly or view it on GitHub
#653 (comment).

shang-qian · 2014-12-10T14:57:49Z

Hi lpantano,
Thanks for your trying, I had also qsub the torque_controller* file as Brad's advice, The result is similar to you. It is in the running state in one node with two cores, but the time use is always zero. And in fact, it doesn't work in the node that I qsub.

lpantano · 2014-12-10T15:40:54Z

Hi,

if you run it just a file with the same header but with another thing,
just to make sure is only related to the cluster and not ipython or
bcbio? and see the ouput files, and if it finishes...or something..

#!/bin/sh
#PBS -q high
#PBS -V
#PBS -N bcbio-c
#PBS -j oe
#PBS -l nodes=1:ppn=2
#PBS -l walltime=239:00:00

cd /SOME/PATH/WITH/FILES
sleep(10)
ls

On 12/10/2014 09:57 AM, shang-qian wrote:

Hi lpantano,
Thanks for your trying, I had also qsub the torque_controller* file as
Brad's advice, The result is similar to you. It is in the running
state in one node with two cores, but the time use is always zero. And
in fact, it doesn't work in the node that I qsub.

—
Reply to this email directly or view it on GitHub
#653 (comment).

chapmanb closed this as completed in 98b0653 Oct 28, 2014

chapmanb added a commit that referenced this issue Dec 2, 2014

Ensure local R site library injected when finding DEXSeq location. Ha…

db00bcd

…ndle GFF retrieval when no GTF file in DEXSeq unit tests. Fixes #653

chapmanb added a commit that referenced this issue Dec 3, 2014

Avoid using bam_index_build2 (BAM indexing with specification of outp…

da3578a

…ut bai file) since not supported in samtools 1.1 (samtools/samtools#199). Fixes #653

error in bcbio structural variant calling #653

error in bcbio structural variant calling #653

Comments

shang-qian commented Oct 28, 2014

chapmanb commented Oct 28, 2014

shang-qian commented Oct 31, 2014

chapmanb commented Oct 31, 2014

shang-qian commented Nov 1, 2014

shang-qian commented Nov 6, 2014

chapmanb commented Nov 7, 2014

shang-qian commented Nov 10, 2014

chapmanb commented Nov 10, 2014

shang-qian commented Nov 20, 2014

chapmanb commented Nov 20, 2014

shang-qian commented Nov 21, 2014

chapmanb commented Nov 21, 2014

shang-qian commented Dec 1, 2014

roryk commented Dec 1, 2014

roryk commented Dec 1, 2014

shang-qian commented Dec 2, 2014

shang-qian commented Dec 2, 2014

roryk commented Dec 2, 2014

shang-qian commented Dec 2, 2014

roryk commented Dec 2, 2014

chapmanb commented Dec 2, 2014

shang-qian commented Dec 3, 2014

chapmanb commented Dec 3, 2014

shang-qian commented Dec 3, 2014

chapmanb commented Dec 3, 2014

shang-qian commented Dec 3, 2014

chapmanb commented Dec 3, 2014

shang-qian commented Dec 4, 2014

chapmanb commented Dec 4, 2014

shang-qian commented Dec 8, 2014

chapmanb commented Dec 8, 2014

shang-qian commented Dec 10, 2014

shang-qian commented Dec 10, 2014

roryk commented Dec 10, 2014

shang-qian commented Dec 10, 2014

roryk commented Dec 10, 2014

shang-qian commented Dec 10, 2014

roryk commented Dec 10, 2014

shang-qian commented Dec 10, 2014

roryk commented Dec 10, 2014

shang-qian commented Dec 10, 2014

roryk commented Dec 10, 2014

shang-qian commented Dec 10, 2014

chapmanb commented Dec 10, 2014

lpantano commented Dec 10, 2014

shang-qian commented Dec 10, 2014

lpantano commented Dec 10, 2014