corrupted MAP output is sometime generated by bowtie 1.2.1 #54

Open
ndaniel opened this Issue Jun 13, 2017 · 6 comments

Comments

Projects
None yet
3 participants

ndaniel commented Jun 13, 2017 edited

Sometime randomly, Bowtie 1.2.1. generates a corrupted MAP file (that is the default Bowtie output) where for example the read id is on second column instead of first column.

If Bowtie 1.2.1 is re-run second time with the same command line parameters and inputs as first time, Bowtie 1.2.1 will run just fine and produce a good MAP file. In first and second case no error or warning is shown.

This behavior has not been noticed with Bowtie 1.X or 1.2.

ndaniel changed the title from corrputed MAP output is sometime generated by bowtie 1.2.1 to corrupted MAP output is sometime generated by bowtie 1.2.1 Jun 13, 2017

Owner

BenLangmead commented Jun 13, 2017

Can you share at least the command line you used when running bowtie? And, preferably, also the reads and index?

ndaniel commented Jun 13, 2017

I will try to share it (when I will come across it) but it is not reproducible!

ndaniel commented Jun 14, 2017 edited

Here is the exact command line where Bowtie 1.2.1 (using the pre-built binaries for Linux X64 on Ubuntu 16.04 LTS) produces very rarely and randomly corrupted MAP file

bowtie \
-t  \
-k 200 \
-v 2 \
-p 4 \
--phred33-quals  \
--suppress 2,4,5,6,7,8 \
--chunkmbs 128 \
 /tmp/transcriptome_index/ \
 /tmp/reads.fq \
2> /tmp/log.stdout.txt \
|  \
LC_ALL=C  \
grep  \
-v  \
-F  \
-f /tmp/special.txt \
> /tmp/transcriptome.map

where the corrupted MAP file has malformed lines which look like:

9z/1	9p/1	ENSG00000184507  <= two read ids instead of one and should be 2 columns not 3
	9p/1	ENSG00000184507  <= one read id but should be 2 columns not 3
9p/1	ENSG00000184507
9p/1	ENSG00000184507

when one would expect something like this (two columns):

9p/1	ENSG00000184507
9p/1	ENSG00000184507
9p/1	ENSG00000184507

Please, note again that this is very hard even for me to reproduce! I have to re-run this over and over like 20 times in order to get once a corrupted MAP file.

ndaniel commented Jun 14, 2017

This bug affects also Bowtie 1.2.1.1 where I still get something like this:

Cx/1	ENSG00000068078
CJ/2	CZ/1	CJ/2	ENSG00000068078
CZ/1	ENSG00000184507
CL/2	ENSG00000068078
CL/2	Cf/1	ENSG00000068078
CS/1	ENSG00000182944

ndaniel commented Jun 14, 2017 edited

This is what I get from the run of Bowtie 1.2.1.1 that gives corrupted MAP file

Time loading forward index: 00:00:00
Time loading mirror index: 00:00:01
End-to-end 2/3-mismatch full-index search: 00:00:00
# reads processed: 25469
# reads with at least one reported alignment: 25469 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 25469 alignments to 1 output stream(s)
Time searching: 00:00:01
Overall time: 00:00:01

and when I re-run Bowtie 1.2.1.1. second time with exactly the same command line options and same input files it runs fine and I get this

Time loading forward index: 00:00:01
Time loading mirror index: 00:00:00
End-to-end 2/3-mismatch full-index search: 00:00:01
# reads processed: 25472
# reads with at least one reported alignment: 25472 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 25472 alignments to 1 output stream(s)
Time searching: 00:00:02
Overall time: 00:00:02

It is weird that the number of reads processed is different even that the input FASTQ files is exactly the same in both cases. The input FASTQ file has 4388 reads!

When running Bowtie 1.2 with the same command line parameters and input files I get this

Time loading forward index: 00:00:00
Time loading mirror index: 00:00:01
End-to-end 2/3-mismatch full-index search: 00:00:01
# reads processed: 4388
# reads with at least one reported alignment: 4388 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 25472 alignments to 1 output stream(s)
Time searching: 00:00:02
Overall time: 00:00:02

where the number of reads processed is 4388 as one would expect!

Owner

BenLangmead commented Jun 14, 2017

Thank you for those details. We will look into it urgently and get back to you.

ch4rr0 self-assigned this Jun 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment