Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Incorrect results when using large index #35
Comments
|
I would like to recreate this issue. It will be helpful to have the fasta file and the command line used for generating the small indexes as well. |
mluypaert
commented
Nov 2, 2016
|
Okay, this dropbox link is the complete fastafile, which I used for the large index (I will disable this link later on). The file was split in two using this command: And the bowtie indexes were all generated with the simplest version of the bowtie-build command:
And for the small indexes:
|
mluypaert
commented
Nov 27, 2016
|
@ch4rr0 Did you manage to recreate the issue? |
|
Hello @mluypaert, Yes I was able to recreate the problem. Unfortunately the work required to have this resolved is nontrivial, so we will delay this fix until the next bowtie release. We currently don't have a timeline for when this release will be, but I will keep you posted as soon as it's decided. |
mluypaert
commented
Feb 1, 2017
|
Hello @ch4rr0 I see on the bowtie website that version 1.2.0 was released on 12/12/2016. However I don't see anything in the changelog about the large-index bugfixing (this issue). I assume this was not fixed in release 1.2.0 yet? Any idea on what release would contain this fix and when to expect it to go public? |
mluypaert
commented
Jun 2, 2017
|
@ch4rr0 @BenLangmead Can you please give me an update on this issue? Are there any timelines on when we could expect this issue to be fixed (by when and in what release)? |
|
I am sorry for letting this one slip through the cracks. I have started looking into the issue and will reply to this thread as I progress. |
mluypaert commentedNov 2, 2016
I found that when mapping a pair of reads using bowtie 1 (tested with 1.1.1 and 1.1.2), the results when using a large index (a truly large index, not a large-index created from a small fasta) are incorrect.
The command used here was:
To prove that the large index was the problem, I split the input fasta I used to generate the large index in two parts, and generated small indexes from each part. Then I ran the same bowtie analysis on each small-index part and concatenated the results.
The two-part small index analysis returned correct results (see bowtie_o45b8iO6CN_split_merged_output.txt), returning perfect matches on ENST00000551148, ENST00000549155, ENST00000546991, ENST00000392979 and ENST00000392977.
The large-index analysis however, returned
This page proves that the read-pair from the input file should match perfectly to the five above mentioned transcripts.
Can anyone please look into this?
ps. I would add the fasta-file I used to generate this large index but the file is 5.8Gb so not fit for upload to github. If you want it, let me know, then we'll see how I could share it.
bowtie_o45b8iO6CN_complete_largeidx_output.txt
bowtie_o45b8iO6CN_split_merged_output.txt
job_o45b8iO6CN_bowtieIn.txt