Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blastx segfault #399

Closed
nextgenusfs opened this issue Oct 17, 2020 · 14 comments
Closed

blastx segfault #399

nextgenusfs opened this issue Oct 17, 2020 · 14 comments

Comments

@nextgenusfs
Copy link

A few users have reported an issue with diamond v2.0.4 blastx seg faulting, some more info here: nextgenusfs/funannotate#503

$ diamond blastx --threads 80 -q genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

...
Building query seed array... [0.42s]
Computing hash join... [0.215s]
Building seed filter... [0.038s]
Searching alignments... [1.321s]
Deallocating buffers... [0.096s]
Clearing query masking... [0.285s]
Computing alignments...
Segmentation fault (core dumped)

Possible it could be related to #397 as I don't see this same error on my smaller tests.

@estolle
Copy link

estolle commented Oct 17, 2020

Same here.

I ran diamond on a database of proteins I previously used successfully. The difference now is only the fasta sequence I use to compare to the DB. This time it has some very long contigs/scaffold (up to 16 Mb).

I tried reducing the threads, excluding unaligned sequences, -c1 option as suggested for better performance and reduction of sensitivity (from 1e-10 to 1e-8).
Diamond v2.0.4.142

diamond blastx --threads 8 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

Computing alignments...
segmentation fault

diamond blastx --threads 80 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

Computing alignments...
segmentation fault

diamond blastx --threads 6 --log -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

Computing alignments...
no further output (as above), yet no segfault (still running)

diamond blastx -c1 --log --threads 80 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-8 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

...
Queries=0 size=3.07422 max_size=3.07422 next=R3_2 ETA=infs
Queries=0 size=3.07422 max_size=3.07422 next=R3_2 ETA=infs
Segmentation fault (core dumped)
many lines of output such as the last two above, then Segfault

diamond blastx --unal 0 -c1 --log --threads 20 -q /scratch/ek/stingless.bee.genomics/annotation/funannotate/TetragonulaCarbonaria/2020_10_16_TetragonulaCarbonaria/predict_misc/genome.softmasked.fa --db diamond -o diamond.matches.tab -e 1e-10 -k 0 --more-sensitive -f 6 sseqid slen sstart send qseqid qlen qstart qend pident length evalue score qcovhsp qframe

Queries=0 size=2.68359 max_size=2.68359 next=R3_2 ETA=infs
Queries=0 size=2.68359 max_size=2.68359 next=R3_2 ETA=infs
Segmentation fault (core dumped)
many lines of output such as the last two above, then Segfault

any idea how to fix or if the newer very contiguous genome sequences are a problem?

@bbuchfink
Copy link
Owner

I was not able to reproduce a segfault in blastx in a quick test. Could you maybe make your query file available to me and let me know the database you're using, so I can look into this.

For very long queries, it would also be worth a try to use frameshift alignment mode which should work better in these cases (even if you don't expect frameshifts).

@estolle
Copy link

estolle commented Oct 18, 2020

Hi

How can I activate te frameshift alignment mode? Its not clear to me which would be the correct option (diamond help)?

I ran a few more tests. It seems the segfault occurs with the first contig. Its 24 Mb large. If I split it at stretches of N I get a 7 Mb, a 15 Mb and a 2 Mb piece. The latter works, the two large ones not. If I split the first 7Mb contig further into 3 pieces, all three pieces work.

I'll send you a copy of the fasta (first contig) and the DB this afternoon.

Thanks alot for your help!

Best
Eckart

@bbuchfink
Copy link
Owner

For the frameshift mode, use -F with the penalty, for example -F 15.

@estolle
Copy link

estolle commented Oct 18, 2020

I emailed you a small test dataset.

Your suggestion of using -F 15 appears to work for this small dataset! Nice! I am running the full contig /database now to see.

Could this error suggest that there are lots of frameshifts present in the fasta?
If I use this setting (-F 15) as a precautionary measure all the time, would this affect the quality of the results in "normal" cases?

@estolle
Copy link

estolle commented Oct 18, 2020

So the first 7Mb of contig 1 vs a tiny test DB of proteins seemed to have worked with the -F 15 option.

I tried running a larger fasta file against the full DB and it runs out of RAM (I had 750 Gb RAM) and got killed. Same if I reduce the DB (to the tiny test DB of proteins). Running only contig 1 (24 Mb) against the test DB is ok with the RMA (spikes every now n then to up to 170 Gb), but the throws an error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std:bad_alloc
Aborted (core dumped)

@bbuchfink
Copy link
Owner

There was a problem with memory usage in the frameshift mode (see other issue). Please try again using the latest commit.

@estolle
Copy link

estolle commented Oct 18, 2020

Is there a binary for the latest commit?

I cannot compile the github clone, neither on my sytem nor within the conda env

[ 81%] Building CXX object CMakeFiles/diamond.dir/src/basic/value.cpp.o
/home/ek/progz/diamond/src/tools/roc.cpp: In constructor ‘FamilyMapping::FamilyMapping(const string&)’:
/home/ek/progz/diamond/src/tools/roc.cpp:81:30: error: converting to ‘std::tuple<char, int>’ from initializer list would use explicit constructor ‘constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = char&; _U2 = int&; = void; _T1 = char; _T2 = int]’
fam2fold[i.first->second] = { domain_class[0], fold };
^
[ 82%] Building CXX object CMakeFiles/diamond.dir/src/tools/merge_tsv.cpp.o
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-copy’
cc1plus: warning: unrecognized command line option ‘-Wno-implicit-fallthrough’
CMakeFiles/diamond.dir/build.make:2030: recipe for target 'CMakeFiles/diamond.dir/src/tools/roc.cpp.o' failed
make[2]: *** [CMakeFiles/diamond.dir/src/tools/roc.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:180: recipe for target 'CMakeFiles/diamond.dir/all' failed
make[1]: *** [CMakeFiles/diamond.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

@estolle
Copy link

estolle commented Oct 18, 2020

I tried a few options with CMAKE but no success

[ 2%] Building CXX object CMakeFiles/arch_avx2.dir/src/dp/swipe/banded_3frame_swipe.cpp.o
In file included from /home/ek/progz/diamond/src/dp/swipe/swipe.cpp:28:0:
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_GENERIC::AsyncTargetBuffer<_t>::max_len() const [with _t = signed char]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_GENERIC::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_GENERIC::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:420:192: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < target_it.count; ++i)
^
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_GENERIC::AsyncTargetBuffer<_t>::max_len() const [with _t = short int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_GENERIC::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_GENERIC::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:424:193: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_GENERIC::AsyncTargetBuffer<_t>::max_len() const [with _t = int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_GENERIC::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = int; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:427:179: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-copy’
cc1plus: warning: unrecognized command line option ‘-Wno-implicit-fallthrough’
[ 3%] Building CXX object CMakeFiles/arch_sse4_1.dir/src/dp/swipe/swipe.cpp.o
[ 4%] Building CXX object CMakeFiles/arch_generic.dir/src/dp/swipe/banded_swipe.cpp.o
[ 5%] Building CXX object CMakeFiles/arch_avx2.dir/src/dp/swipe/swipe.cpp.o
[ 5%] Building CXX object CMakeFiles/arch_sse4_1.dir/src/dp/swipe/banded_swipe.cpp.o
In file included from /home/ek/progz/diamond/src/dp/swipe/swipe.cpp:28:0:
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_SSE4_1::AsyncTargetBuffer<_t>::max_len() const [with _t = signed char]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_SSE4_1::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_SSE4_1::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:420:192: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < target_it.count; ++i)
^
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_SSE4_1::AsyncTargetBuffer<_t>::max_len() const [with _t = short int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_SSE4_1::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_SSE4_1::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:424:193: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_SSE4_1::AsyncTargetBuffer<_t>::max_len() const [with _t = int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_SSE4_1::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = int; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:427:179: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
In file included from /home/ek/progz/diamond/src/dp/swipe/swipe.cpp:28:0:
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_AVX2::AsyncTargetBuffer<_t>::max_len() const [with _t = signed char]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_AVX2::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_AVX2::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:420:192: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < target_it.count; ++i)
^
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_AVX2::AsyncTargetBuffer<_t>::max_len() const [with _t = short int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_AVX2::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = ARCH_AVX2::score_vector; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:424:193: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h: In instantiation of ‘int ARCH_AVX2::AsyncTargetBuffer<_t>::max_len() const [with _t = int]’:
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:356:35: required from ‘std::__cxx11::list DP::Swipe::ARCH_AVX2::swipe(const sequence&, Frame, DynamicIterator&, _cbs, int, std::vector&, Statistics&) [with _sv = int; _traceback = DP::VectorTraceback; _cbs = const signed char*]’
/home/ek/progz/diamond/src/dp/swipe/swipe.cpp:427:179: required from here
/home/ek/progz/diamond/src/dp/swipe/target_iterator.h:233:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-copy’
cc1plus: warning: unrecognized command line option ‘-Wno-implicit-fallthrough’
[ 6%] Building CXX object CMakeFiles/arch_generic.dir/src/search/collision.cpp.o
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-copy’
cc1plus: warning: unrecognized command line option ‘-Wno-implicit-fallthrough’
[ 7%] Building CXX object CMakeFiles/arch_avx2.dir/src/dp/swipe/banded_swipe.cpp.o

...

[ 73%] Building CXX object CMakeFiles/diamond.dir/src/align/gapped.cpp.o
[ 74%] Building CXX object CMakeFiles/diamond.dir/src/align/culling.cpp.o
[ 75%] Building CXX object CMakeFiles/diamond.dir/src/cluster/medoid.cpp.o
[ 75%] Building CXX object CMakeFiles/diamond.dir/src/cluster/cluster_registry.cpp.o
[ 76%] Building CXX object CMakeFiles/diamond.dir/src/cluster/multi_step_cluster.cpp.o
[ 77%] Building CXX object CMakeFiles/diamond.dir/src/cluster/mcl.cpp.o
[ 77%] Building CXX object CMakeFiles/diamond.dir/src/align/output.cpp.o
[ 78%] Building CXX object CMakeFiles/diamond.dir/src/tools/roc.cpp.o
/home/ek/progz/diamond/src/tools/roc.cpp: In constructor ‘FamilyMapping::FamilyMapping(const string&)’:
/home/ek/progz/diamond/src/tools/roc.cpp:81:30: error: converting to ‘std::tuple<char, int>’ from initializer list would use explicit constructor ‘constexpr std::tuple<_T1, _T2>::tuple(_U1&&, _U2&&) [with _U1 = char&; _U2 = int&; = void; _T1 = char; _T2 = int]’
fam2fold[i.first->second] = { domain_class[0], fold };
^
[ 79%] Building CXX object CMakeFiles/diamond.dir/src/test/data.cpp.o
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-copy’
cc1plus: warning: unrecognized command line option ‘-Wno-implicit-fallthrough’
CMakeFiles/diamond.dir/build.make:2030: recipe for target 'CMakeFiles/diamond.dir/src/tools/roc.cpp.o' failed
make[2]: *** [CMakeFiles/diamond.dir/src/tools/roc.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:180: recipe for target 'CMakeFiles/diamond.dir/all' failed
make[1]: *** [CMakeFiles/diamond.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

the 2.0.4 release compiles without problems
wget http://github.com/bbuchfink/diamond/archive/v2.0.4.tar.gz
tar xzf v2.0.4.tar.gz
cd diamond-2.0.4
mkdir bin
cd bin
cmake ..
make -j4

@bbuchfink
Copy link
Owner

I have tried to fix the compiler error here: 7c526e0

@estolle
Copy link

estolle commented Oct 19, 2020

Awesome! It compiled now. Thanks so much for fixing this so super quick!

I'll shortly try how it works with my dataset.

@estolle
Copy link

estolle commented Oct 19, 2020

It works in under 1 minute and with a tiny RAM footprint if I use the -F 15 option (without it, there is still a segfault).

Thanks for fixing it so rapidly!

@bbuchfink
Copy link
Owner

Glad it is working now. I'll look into the segfault too but it should be fine using the frameshift mode.

@bbuchfink
Copy link
Owner

Sorry this took longer, but the segfault should be fixed in the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants