Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bowtie hangs when running on very large genomes #124

Closed
ferayd opened this issue Jul 1, 2021 · 8 comments
Closed

Bowtie hangs when running on very large genomes #124

ferayd opened this issue Jul 1, 2021 · 8 comments

Comments

@ferayd
Copy link

ferayd commented Jul 1, 2021

I downloaded the "bread wheat" genome from here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/220/415/GCA_002220415.3_Triticum_4.0/GCA_002220415.3_Triticum_4.0_genomic.fna.gz

Then I ran bowtie-build:
bowtie-build GCA_002220415.3_Triticum_4.0_genomic.fna WHEAT_JHU4_genome

Then I created a query input file (raw) called reads.txt, with a single sequence in it. For example:
AAAAAAAAAAAAAAAAAAAA

Then I ran bowtie:
bowtie -x WHEAT_JHU4_genome -r reads.txt

It hangs. If I split the genome to two pieces, it works well for each piece. So I think this problem is because of the size of the genome.

I am attaching the verbose output. It shows where bowtie hangs:
bowtie_verbose.txt

Thanks

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Jul 13, 2021

I am not able to get bowtie-build to successfully build the index, but if I use a bowtie2 index then the alignment runs successfully.

I am investigating why bowtie-build is not able to build the index.

./bowtie-align-l -x ../bowtie2/triticum -r reads.raw
0	+	CM022213.1	49156482	AAAAAAAAAAAAAAAAAAAA	IIIIIIIIIIIIIIIIIIII	21146	
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments

@blakemeyers
Copy link

I am also wondering about the status of the fix to this problem, as we have multiple large genomes that we need to use - the same scale as wheat (i.e. rye, barley, oat). Any updates on either the original problem (hanging during mapping on the indexed genome) or the second problem (failure to build the index)?

thank you!

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Aug 10, 2021

I am still actively looking into this one, but have yet to figure out the underlying cause of this bug. As a temporary work around you can build those indexes with bowtie2 and use them for alignment in bowtie.

@blakemeyers
Copy link

This work around would only address the secondary issue that you encountered with indexing, but not the original problem of bowtie hanging during the mapping/aligning stage, right? I don't think the problem originally reported was with indexing.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Aug 10, 2021

I am not able to get bowtie-build to successfully build the index, but if I use a bowtie2 index then the alignment runs successfully.

I am investigating why bowtie-build is not able to build the index.

./bowtie-align-l -x ../bowtie2/triticum -r reads.raw
0	+	CM022213.1	49156482	AAAAAAAAAAAAAAAAAAAA	IIIIIIIIIIIIIIIIIIII	21146	
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments

I am quite confident it is an index issue.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Sep 2, 2021

We have finally tracked down and pushed a fix for this bug to the bug_fixes branch.
We thank all of you who have been impacted by this issue for your patience, and are
in the process of putting together an official release which will include this change.

./bowtie-build-l GCA_002220415.3_Triticum_4.0_genomic.fna triticum --threads 12 --packed
...
Wrote 4422321688 bytes to primary EBWT file: triticum.1.ebwtl
Wrote 3849428340 bytes to secondary EBWT file: triticum.2.ebwtl
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 15397713314
    bwtLen: 15397713315
    sz: 3849428329
    bwtSz: 3849428329
    lineRate: 7
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffffffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 160
    ftabLen: 1048577
    ftabSz: 8388616
    offsLen: 481178542
    offsSz: 3849428336
    isaLen: 0
    isaSz: 0
    lineSz: 128
    sideSz: 128
    sideBwtSz: 112
    sideBwtLen: 448
    numSidePairs: 17184948
    numSides: 34369896
    numLines: 34369896
    ebwtTotLen: 4399346688
    ebwtTotSz: 4399346688
    reverse: 0
...
Wrote 4422321688 bytes to primary EBWT file: triticum.rev.1.ebwtl
Wrote 3849428340 bytes to secondary EBWT file: triticum.rev.2.ebwtl
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 15397713314
    bwtLen: 15397713315
    sz: 3849428329
    bwtSz: 3849428329
    lineRate: 7
    linesPerSide: 1
    offRate: 5
    offMask: 0xffffffffffffffe0
    isaRate: -1
    isaMask: 0xffffffff
    ftabChars: 10
    eftabLen: 20
    eftabSz: 160
    ftabLen: 1048577
    ftabSz: 8388616
    offsLen: 481178542
    offsSz: 3849428336
    isaLen: 0
    isaSz: 0
    lineSz: 128
    sideSz: 128
    sideBwtSz: 112
    sideBwtLen: 448
    numSidePairs: 17184948
    numSides: 34369896
    numLines: 34369896
    ebwtTotLen: 4399346688
    ebwtTotSz: 4399346688
    reverse: 0

ls triticum*.ebwtl
triticum.1.ebwtl  triticum.3.ebwtl  triticum.rev.1.ebwtl
triticum.2.ebwtl  triticum.4.ebwtl  triticum.rev.2.ebwtl

./bowtie-align-l -x triticum -c AAAAAAAAAAAAAAAAAAAA
0	+	CM022213.1	49156482	AAAAAAAAAAAAAAAAAAAA	IIIIIIIIIIIIIIIIIIII	21146	
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments

@blakemeyers
Copy link

Thank you so much! I really appreciate this. We'll test it out and let you know if we encounter any issues.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Sep 14, 2021

This change is now available in v1.3.1. Thank you for providing sample files and for helping test.

@ch4rr0 ch4rr0 closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants