Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmseqs2 prefilter failed: No k-mer could be extracted for the database genomad_db/genomad_db #25

Closed
haleyhallowell opened this issue Jun 30, 2023 · 4 comments

Comments

@haleyhallowell
Copy link

Hello! I am currently trying to utilize the genomad annotate module to annotate a .fna file of Megahit assembled contigs. After downloading the database and attaching a unique identifier to each .fna headline (because they all started with k127), i ran the following command:

genomad annotate final_vOTUs_numbered.fna ./genomad_output ./genomad_db

I get this error directly from genomad:

Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 137, in run_mmseqs2
subprocess.run(command, stdout=fout, stderr=fout, check=True)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mmseqs', 'search', PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db'), PosixPath('genomad_db/genomad_db'), PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/search_db/search_db'), PosixPath('genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp'), '--threads', '48', '-s', '4.2', '--cov-mode', '1', '-c', '0.2', '-e', '0.001', '--split', '0', '--split-mode', '0', '--max-seqs', '1000000', '--min-ungapped-score', '20', '--max-rejected', '225']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/genomad/bin/genomad", line 8, in
sys.exit(cli())
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/rich_click/rich_group.py", line 21, in main
rv = super().main(*args, standalone_mode=False, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/cli.py", line 425, in annotate
genomad.annotate.main(
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/modules/annotate.py", line 202, in main
mmseqs2_obj.run_mmseqs2(threads, sensitivity, evalue, splits)
File "/home/hhallow1/.conda/envs/genomad/lib/python3.10/site-packages/genomad/mmseqs2.py", line 140, in run_mmseqs2
raise Exception(f"'{command_str}' failed.") from e
Exception: 'mmseqs search genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/search_db/search_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp --threads 48 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 0 --split-mode 0 --max-seqs 1000000 --min-ungapped-score 20 --max-rejected 225' failed.

Here is the output from mmseqs2:

Converting sequences
[=====
Time for merging to query_db_h: 0h 0m 0s 79ms
Time for merging to query_db: 0h 0m 0s 31ms
Database type: Aminoacid
Time for processing: 0h 0m 1s 380ms
search genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_m
mseqs2/search_db/search_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp --threads 48 -s 4.2 --cov-mode 1 -c 0.2 -e 0.001 --split 0 --split-mode 0 --max-seq
s 1000000 --min-ungapped-score 20 --max-rejected 225

MMseqs Version: 13.45111
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.2
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Max reject 225
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Threads 48
Compressed 0
Verbosity 3
Seed substitution matrix nucl:nucleotide.out,aa:VTML80.out
Sensitivity 4.2
k-mer length 5
k-score 2147483647
Alphabet size nucl:5,aa:21
Max results per query 1000000
Split database 0
Split mode 0
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 1
Mask lower case residues 0
Minimum diagonal score 20
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Maximum seq. id. threshold 0.9
Minimum seq. id. 0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Chain overlapping alignments 0
Merge query 1
Search type 0
Search iterations 1
Start sensitivity 4
Search steps 1
Exhaustive search mode false
Filter results during exhaustive search 0
Strand selection 1
LCA search mode false
Disk space limit 0
MPI runner
Force restart with latest tmp false
Remove temporary files false

prefilter genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/query_db/query_db genomad_db/genomad_db genomad_output/final_vOTUs_numbered_annotate/final_vOTUs_numbered_mmseqs2/tmp/4444936417411739143/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --seed-sub-mat nucl:nucleotide.out,aa:VTML80.out -s 4.2 -k 5 --k-score 2147483647 --alph-size nucl:5,aa:21 --max-seq-len 65535 --max-seqs 1000000 --split 0 --split-mode 0 --split-memory-limit 0 -c 0.2 --cov-mode 1 --comp-bias-corr 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-lower-case 0 --min-ungapped-score 20 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca 1 --pcb 1.5 --threads 48 --compressed 0 -v 3

Query database size: 56046 type: Aminoacid
Estimated memory consumption: 1G
Target database size: 227897 type: Profile
Process prefiltering step 1 of 1

Index table k-mer threshold: 104 at k-mer size 5
Index table: counting k-mers
[=================================================================] 227.90K 10s 479ms
Index table: Masked residues: 0
No k-mer could be extracted for the database genomad_db/genomad_db.
Maybe the sequences length is less than 14 residues.
Error: Prefilter died

the .fna file, the genomad_output directory and the genomad_db directory are all in the same directory, and i am running the command from that directory as well. Any ideas how to fix this? Thanks!!

@apcamargo
Copy link
Owner

Hi @haleyhallowell. I don't think I've seen this error before. Do you still get it if you remove short sequences? Could you share the input with me?

@haleyhallowell
Copy link
Author

haleyhallowell commented Jun 30, 2023

Hey! Sure, I can share with you; it is attached! I'm currently waiting on another run to go through the HPC queue. I noticed that when i used pip install genomad that i had to manually install prodigal-gv and mmseqs2, so i made a new environment using a conda install. ill update you once that goes through!
final_vOTUs_numbered.txt.zip

Not sure on the short sequences, nothing should be that short as i filter out <2000bp

@haleyhallowell
Copy link
Author

Hi! Just wanted to let you know i got it working -- seems like it was the weird install with pip i mentioned above that made it throw this error. Conda install the second time got it working. Thanks!

@apcamargo
Copy link
Owner

Thanks for letting me know! I'll close the issue.

Maybe it was a problem with the MMseqs2 version? The latest versions of geNomad are only compatible with version 14-7e284.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants