-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blastx segfault #399
Comments
Same here. I ran diamond on a database of proteins I previously used successfully. The difference now is only the fasta sequence I use to compare to the DB. This time it has some very long contigs/scaffold (up to 16 Mb). I tried reducing the threads, excluding unaligned sequences, -c1 option as suggested for better performance and reduction of sensitivity (from 1e-10 to 1e-8). diamond blastx --threads 8 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe Computing alignments... diamond blastx --threads 80 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe Computing alignments... diamond blastx --threads 6 --log -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe Computing alignments... diamond blastx -c1 --log --threads 80 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-8 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe ... diamond blastx --unal 0 -c1 --log --threads 20 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe Queries=0 size=2.68359 max_size=2.68359 next=R3_2 ETA=infs any idea how to fix or if the newer very contiguous genome sequences are a problem? |
I was not able to reproduce a segfault in blastx in a quick test. Could you maybe make your query file available to me and let me know the database you're using, so I can look into this. For very long queries, it would also be worth a try to use frameshift alignment mode which should work better in these cases (even if you don't expect frameshifts). |
Hi How can I activate te frameshift alignment mode? Its not clear to me which would be the correct option (diamond help)? I ran a few more tests. It seems the segfault occurs with the first contig. Its 24 Mb large. If I split it at stretches of N I get a 7 Mb, a 15 Mb and a 2 Mb piece. The latter works, the two large ones not. If I split the first 7Mb contig further into 3 pieces, all three pieces work. I'll send you a copy of the fasta (first contig) and the DB this afternoon. Thanks alot for your help! Best |
For the frameshift mode, use |
I emailed you a small test dataset. Your suggestion of using -F 15 appears to work for this small dataset! Nice! I am running the full contig /database now to see. Could this error suggest that there are lots of frameshifts present in the fasta? |
So the first 7Mb of contig 1 vs a tiny test DB of proteins seemed to have worked with the -F 15 option. I tried running a larger fasta file against the full DB and it runs out of RAM (I had 750 Gb RAM) and got killed. Same if I reduce the DB (to the tiny test DB of proteins). Running only contig 1 (24 Mb) against the test DB is ok with the RMA (spikes every now n then to up to 170 Gb), but the throws an error: |
There was a problem with memory usage in the frameshift mode (see other issue). Please try again using the latest commit. |
Is there a binary for the latest commit? I cannot compile the github clone, neither on my sytem nor within the conda env [ 81%] Building CXX object CMakeFiles/diamond.dir/src/basic/value.cpp.o |
I tried a few options with CMAKE but no success [ 2%] Building CXX object CMakeFiles/arch_avx2.dir/src/dp/swipe/banded_3frame_swipe.cpp.o ... [ 73%] Building CXX object CMakeFiles/diamond.dir/src/align/gapped.cpp.o the 2.0.4 release compiles without problems |
I have tried to fix the compiler error here: 7c526e0 |
Awesome! It compiled now. Thanks so much for fixing this so super quick! I'll shortly try how it works with my dataset. |
It works in under 1 minute and with a tiny RAM footprint if I use the -F 15 option (without it, there is still a segfault). Thanks for fixing it so rapidly! |
Glad it is working now. I'll look into the segfault too but it should be fine using the frameshift mode. |
Sorry this took longer, but the segfault should be fixed in the latest release. |
A few users have reported an issue with diamond v2.0.4 blastx seg faulting, some more info here: nextgenusfs/funannotate#503
Possible it could be related to #397 as I don't see this same error on my smaller tests.
The text was updated successfully, but these errors were encountered: