-
Notifications
You must be signed in to change notification settings - Fork 6
Functionalities
mantis <function>
Documentation is also available via console:
mantis -h
mantis setup
Optional arguments: --chunk_size / -cs
This method will download and unzip all the default data into their respective path folders. Please don't move any of this data around during execution!
You can use -cs
or --chunk_size
to set the amount of HMM profiles per chunk. By default each HMM chunk will have 5000 profiles, this allows better throughput as it ensures resources saturation.
Note: It's usually good practise to check your installation with mantis check
, if something has failed you should run mantis setup
again. You can then run mantis run_test
just to be extra sure Mantis runs smoothly.
mantis check
This method will check the paths from the MANTIS.cfg and check the installed custom references.
mantis check_sql
This method will check the integrity of the metadata SQL databases for each of the reference databases.
mantis run_test
This method will annotate a test sample with test HMMs.
mantis run -i input.faa -o output_folder -od organism_details
Mandatory arguments: --input / -i
Optional arguments: --output_folder / -o
--mantis_config / -mc
--evalue_threshold / -et
--overlap_value / -ov
--minimum_consensus_overlap / -mco
--domain_algorithm / -da
--time_limit / -tl
--organism_details / -od
--keep_files / -k
--skip_consensus / -sk
--no_unifunc / -nuf
--no_consensus_expansion / -nce
--no_taxonomy / -notax
--kegg_matrix / -km
--verbose_kegg_matrix / -vkm
--output_gff / -gff
--default_workers / -dw
--chunk_size / -cs
--hmmer_threads / -ht
--cores / -c
--memory / -m
-
target Mantis will run on the
target
fasta path. -
output_folder If no output folder is provided, data will be saved to the
current_path/target_date_time
. - mantis_config use this option to use a custom MANTIS.cfg file
- evalue_threshold see what is the default e-value?
- overlap_value If you would like to allow partial overlap between hits. Default is 0.1, maximum is 0.3.
- minimum_consensus_overlap If you would like to set your own value for minimum residues overlap during consensus generation. Default is 0.7, 0 to accept any consistent hit, regardless of residues overlap.
-
domain_algorithm If you would like to choose another algorithm for processing hits. Default is
dfs
, please see Intra-reference hit processing. - time_limit If you would like to set a time limit for the dfs algorithm. Default is 60 seconds.
- organism_details this variable is used for determining which references to use during taxonomic specific annotation. It accepts an NCBI ID, organism name or GTDB lineage; if a string contains a blank space please include the string within quotes (e.g. "genus species"). Any taxon level can be provided. If none are provided, only the general HMMs are used.
- keep_files use this option to keep extra output files
-
skip_consensus use this option to skip the generation of the
consensus_output.tsv
file - no_unifunc use this option to not use UniFunc for similarity analysis of functional descriptions during consensus generation
-
no_consensus_expansion use this option to skip expansion of hits during consensus generation. A
consensus_output.tsv
file is still generated, however hits are not expanded based on their functional similarity to other hits (e.g. if two hits for the same residues point towards the same function, since we skip hit expansion, only one of those is kept, otherwise hits both would be kept) -
kegg_matrix use this option to generate a KEGG module completeness matrix, please see KEGG module completeness for more details. You need to the
consensus_annotation.tsv
to run this! -
verbose_kegg_matrix use this option to generate the KEGG module completeness matrix in verbose mode, please see KEGG module completeness for more details. You need to the
consensus_annotation.tsv
to run this! - default_workers use this to set the number of virtual workers used by Mantis. This is different from the physical . Default number of workers corresponds to the number of physical cores.
- chunk_size use this to set the size of the chunks that your sample files will be divided in. 1000 sequences per chunk by default.
- hmmer_threads use this to set the number of threads used by HMMER. 1 by default.
- cores use this to set the number of physical cores used by Mantis. Mantis uses all available physical cores by default.
- memory use this to set the amount of RAM used by Mantis (in GB). Mantis uses all available RAM by default.
Example
mantis run -i mantis/tests/test_sample.faa -od "Escherichia coli"
mantis run -i target.tsv -o output_folder
Mandatory arguments: --input / -i
Optional arguments: --output_folder / -o
--mantis_config / -mc
--evalue_threshold / -et
--overlap_value / -ov
--minimum_consensus_overlap / -mco
--domain_algorithm / -da
--time_limit / -tl
--keep_files / -k
--skip_consensus / -sk
--no_unifunc / -nuf
--no_consensus_expansion / -nce
--kegg_matrix / -km
--verbose_kegg_matrix / -vkm
--default_workers / -dw
--chunk_size / -cs
--hmmer_threads / -ht
--cores / -c
--memory / -m
The parameters when annotating multiple samples are very similar to when annotating one sample, what changes is simple the format of the input: The target
tsv file should have the following format:
Query name | Absolute sample path | Organism details | Genetic code |
---|---|---|---|
query_name_1 | target_path_1 | 561 | |
query_name_2 | target_path_2 | Proteobacteria | |
query_name_3 | target_path_3 | ||
query_name_3 | target_path_3 | Clostridium_P perfringens | |
query_name_4 | target_path_4 | Escherichia coli | 11 |
An example file is provided example_file.tsv
. The query name and the sample path are mandatory.
The organism details and genetic code columns are optional. The organism details is relevant if you want to use taxa specific databases to annotate your sample; here you can use the taxa name, NCBI or GTBTK IDs or GTBK lineages (e.g., d__Archaea;p__Halobacteriota;c__Methanosarcinia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanolobus;s__Methanolobus psychrophilus
).
The genetic code is only used if you are translating gene calls. By default, the genetic code 11 is used.
Mantis also accepts directory paths or compressed files (.gz
,.zip
,.tar.gz
). Samples will be uncompressed and deleted after execution. Keep in mind that with this method it's not possible to input the taxonomical classification of each sample; for taxa-resolved annotations use the previous input methods.
Mantis scales well with Metagenomes, since it automatically splits fasta files into evenly sized chunks, ensuring parallelization without the potential idle time you'd get due to iterating over sample sequences or the differently sized HMM profiles references.
Mantis supports the use of HMMs using HMMER and BLAST-like homology search using Diamond.