Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

problem with fastq files #13

Closed
maxulysse opened this issue May 10, 2016 · 8 comments
Closed

problem with fastq files #13

maxulysse opened this issue May 10, 2016 · 8 comments
Assignees
Labels

Comments

@maxulysse
Copy link
Member

instrument IDs to be fixed on :
HCC1954.normal_S22_L001_R1_001.fastq.gz
HCC1954.normal_S22_L001_R2_001.fastq.gz
HCC1954.tumor2_S24_L002_R1_001.fastq.gz
HCC1954.tumor2_S24_L002_R2_001.fastq.gz
HCC1954.tumor3_S25_L001_R1_001.fastq.gz
HCC1954.tumor3_S25_L001_R2_001.fastq.gz

@maxulysse maxulysse assigned maxulysse and szilvajuhos and unassigned maxulysse May 10, 2016
@maxulysse maxulysse added the bug label May 10, 2016
@maxulysse
Copy link
Member Author

It is causing problem on the mapping, which is not working so everything else is not working so @pallolason believed his script wasn't working, and I do think it is working well.

@szilvajuhos
Copy link
Collaborator

szilva@milou1 ~/dev/CAW $ ./nextflow run multiFQ.nf --sample newsample.tsv [...] Error executing process > 'MarkDuplicates (1)' Caused by: Process 'MarkDuplicates (1)' terminated with an error exit status Command executed: echo [HCC1954, HCC1954] HCC1954.tumor_2 HCC1954.tumor_2_1:HCC1954.tumor_2_2 HCC1954.tumor_2.bam > ble java -Xmx7g -jar /sw/apps/bioinfo/picard/1.118/milou/MarkDuplicates.jar INPUT=HCC1954.tumor_2.bam METRICS_FILE=HCC1954.tumor_2.bam.metrics TMP_DIR=. A SSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE OUTPUT=HCC1954.tumor_2.md.bam

Command exit status:
1
Command output:
(empty)
Command error:
/etc/profile.d/modules.sh: line 86: PS1: unbound variable
[Wed May 11 12:01:25 CEST 2016] picard.sam.MarkDuplicates INPUT=[HCC1954.tumor_2.bam] OUTPUT=HCC1954.tumor_2.md.bam METRICS_FILE=HCC1954.tumor_2.bam.metrics ASSUME_SORTED=true TMP_DIR=[.] VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Wed May 11 12:01:25 CEST 2016] Executing as szilva@milou1.uppmax.uu.se on Linux 2.6.32-573.22.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15; Picard version: 1.118(2329276ea55d31ab6b19bab55b9ee7b51e4a446e_1406559781) IntelDeflater
INFO 2016-05-11 12:01:26 MarkDuplicates Start of doWork freeMemory: 2014301840; totalMemory: 2025848832; maxMemory: 6681067520
INFO 2016-05-11 12:01:26 MarkDuplicates Reading input file and constructing read end information.
INFO 2016-05-11 12:01:26 MarkDuplicates Will retain up to 26512172 data points before spilling to disk.
[Wed May 11 12:01:26 CEST 2016] picard.sam.MarkDuplicates done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2025848832
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" java.lang.IllegalArgumentException: Cannot add sequence that already exists in SAMSequenceDictionary: 1
at htsjdk.samtools.SAMSequenceDictionary.setSequences(SAMSequenceDictionary.java:68)
at htsjdk.samtools.SAMSequenceDictionary.(SAMSequenceDictionary.java:45)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:113)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:502)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:165)
at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:124)
at htsjdk.samtools.SAMFileReader.init(SAMFileReader.java:689)
at htsjdk.samtools.SAMFileReader.(SAMFileReader.java:201)
at htsjdk.samtools.SAMFileReader.(SAMFileReader.java:156)
at picard.sam.MarkDuplicates.openInputs(MarkDuplicates.java:355)
at picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:405)
at picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:177)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
at picard.sam.MarkDuplicates.main(MarkDuplicates.java:161)

Work dir:
/gulo/glob/szilva/private/CAW/work/74/7a089fa13eee32ca6af05e57aeaf59

@szilvajuhos
Copy link
Collaborator

szilvajuhos commented May 11, 2016

When using only normal[12] and tumor[12], it is OK, when adding tumor3, it crashes. Milou is going down soon, I could not find out, but looks it is not the actual files, but the number of files that matters - using normal[12] and tumor[13] also runs just fine.

@pallolason
Copy link
Collaborator

I never saw this error - markdups always worked fine for me ???

@szilvajuhos
Copy link
Collaborator

If I check out the last version Pall commited (2ad9d4c 2016-05-09 | Merge branch 'mergebambysample' [Pall Isolfur Olason] ) it works fine :S

@pallolason
Copy link
Collaborator

Yes - this is the current master running fine:

$ ../nextflow run multiFQ.nf --sample newsample.tsv
N E X T F L O W ~ version 0.17.3
Launching multiFQ.nf
[warm up] executor > local
[71/bc6aaa] Submitted process > MappingBwa (3)
[33/019da8] Submitted process > MappingBwa (5)
[17/6d2d95] Submitted process > MappingBwa (7)
[24/9a9295] Submitted process > MappingBwa (4)
[ab/ed4063] Submitted process > MappingBwa (8)
[f3/19a6ce] Submitted process > MappingBwa (6)
[44/7dc79f] Submitted process > MappingBwa (2)
[6e/ea81d2] Submitted process > MappingBwa (1)
[e9/1b698f] Submitted process > MergeBam (4)
[9d/89bbaf] Submitted process > MergeBam (3)
[0b/f54181] Submitted process > MergeBam (2)
[d8/96cdc7] Submitted process > MergeBam (1)
[86/6c307e] Submitted process > MarkDuplicates (1)
[e9/5531b2] Submitted process > MarkDuplicates (2)
[41/b33f85] Submitted process > MarkDuplicates (3)
[bd/f136d0] Submitted process > MarkDuplicates (4)

$ cat newsample.tsv | cut -f4 | while read i; do echo -n "$i "; zcat $i |
wc -l; done
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.normal_S22_L001_R2_001.fastq.gz
10000
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.normal_S22_L002_R2_001.fastq.gz
10012
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor1_S23_L001_R2_001.fastq.gz
7000
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor1_S23_L002_R2_001.fastq.gz
6116
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor2_S24_L001_R2_001.fastq.gz
3296
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor2_S24_L002_R2_001.fastq.gz
3296
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor3_S25_L001_R2_001.fastq.gz
3600
/sw/data/uppnex/ToolBox/TCGAbenchmark/chr17_multilane/HCC1954.tumor3_S25_L002_R2_001.fastq.gz
3716

$ find work/ -name "*.md.bam" | while read i; do echo -n "$i "; samtools
view $i | wc -l;
donework/e9/5531b25438e1c072cb40535943f04b/HCC1954.tumor_2.md.bam 3303
work/bd/f136d0f18411b316424f6617773e0a/HCC1954.normal.md.bam 10022
work/86/6c307e0b4f74d53b9b8f6f60d4be57/HCC1954.tumor_3.md.bam 3666
work/41/b33f8535dc6ea9c4f225fe1eb8ccd7/HCC1954.tumor_1.md.bam 6561

On Thu, May 12, 2016 at 9:41 AM, Szilveszter Juhos <notifications@github.com

wrote:

If I check out the last version Pall commited (2ad9d4c
2ad9d4c
2016-05-09 | Merge branch 'mergebambysample' [Pall Isolfur Olason] ) it
works fine :S


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#13 (comment)

@szilvajuhos
Copy link
Collaborator

Frankly can not find out what is the real case. Runnig like
szilva@milou2 ~/dev/CAW $ interactive -p devel -A b2013064 -t 60 nextflow -c milou.config run multiFQ.nf --sample newsample.tsv
never fails, it works on my laptop, the only case when it dies is when running straight on milou's terminal. Likely it is running out of resources - or something else.

@szilvajuhos
Copy link
Collaborator

To check status of a pipe we can have a look at the PIPESTATUS shell variable like

$ ls|head|cat - > /dev/null
$ echo ${PIPESTATUS[@]}
0 0 0

See http://stackoverflow.com/questions/1221833/bash-pipe-output-and-capture-exit-status

maxulysse referenced this issue in maxulysse/nf-core_sarek Jun 14, 2019
* Add BamQC, CompressVCFsnpEff and CompressVCFvep processes
* Add Citation documentation
* Fix merge in annotation
* Merge BamQCmapped and BamQCrecalibrated processes into BamQC process
* Removed BamQCmapped, BamQCrecalibrated and CompressVCF processes
* Split CompressVCF process into CompressVCFsnpEff and CompressVCFvep processes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants