Skip to content

Functionalities

Pedro Q edited this page Jan 13, 2022 · 14 revisions

Functionalities

mantis  <function>

Documentation is also available via console:

mantis  -h

Setup databases:

mantis  setup

Optional arguments: --chunk_size / -cs

This method will download and unzip all the default data into their respective path folders. Please don't move any of this data around during execution!
You can use -cs or --chunk_size to set the amount of HMM profiles per chunk. By default each HMM chunk will have 5000 profiles, this allows better throughput as it ensures resources saturation.

Note: It's usually good practise to check your installation with mantis check, if something has failed you should run mantis setup again. You can then run mantis run_test just to be extra sure Mantis runs smoothly.

Check installation:

mantis  check

This method will check the paths from the MANTIS.cfg and check the installed custom references.

Check SQL databases:

mantis  check_sql

This method will check the integrity of the metadata SQL databases for each of the reference databases.

Test run:

mantis run_test

This method will annotate a test sample with test HMMs.

Annotate one sample:

mantis run -i input.faa -o output_folder -od organism_details

Mandatory arguments: --input / -i
Optional arguments:  --output_folder / -o
                     --mantis_config / -mc
                     --evalue_threshold / -et
                     --overlap_value / -ov
                     --minimum_consensus_overlap / -mco
                     --domain_algorithm / -da
                     --time_limit / -tl
                     --organism_details / -od
                     --keep_files / -k
                     --skip_consensus / -sk
                     --no_unifunc / -nuf
                     --no_consensus_expansion / -nce
                     --no_taxonomy / -notax
                     --kegg_matrix / -km
                     --verbose_kegg_matrix / -vkm
                     --output_gff / -gff
                     --default_workers / -dw
                     --chunk_size / -cs
                     --hmmer_threads / -ht
                     --cores / -c
                     --memory / -m
  • target Mantis will run on the target fasta path.
  • output_folder If no output folder is provided, data will be saved to the current_path/target_date_time.
  • mantis_config use this option to use a custom MANTIS.cfg file
  • evalue_threshold see what is the default e-value?
  • overlap_value If you would like to allow partial overlap between hits. Default is 0.1, maximum is 0.3.
  • minimum_consensus_overlap If you would like to set your own value for minimum residues overlap during consensus generation. Default is 0.7, 0 to accept any consistent hit, regardless of residues overlap.
  • domain_algorithm If you would like to choose another algorithm for processing hits. Default is dfs, please see Intra-reference hit processing.
  • time_limit If you would like to set a time limit for the dfs algorithm. Default is 60 seconds.
  • organism_details this variable is used for determining which references to use during taxonomic specific annotation. It accepts an NCBI ID, organism name or GTDB lineage; if a string contains a blank space please include the string within quotes (e.g. "genus species"). Any taxon level can be provided. If none are provided, only the general HMMs are used.
  • keep_files use this option to keep extra output files
  • skip_consensus use this option to skip the generation of the consensus_output.tsv file
  • no_unifunc use this option to not use UniFunc for similarity analysis of functional descriptions during consensus generation
  • no_consensus_expansion use this option to skip expansion of hits during consensus generation. A consensus_output.tsv file is still generated, however hits are not expanded based on their functional similarity to other hits (e.g. if two hits for the same residues point towards the same function, since we skip hit expansion, only one of those is kept, otherwise hits both would be kept)
  • kegg_matrix use this option to generate a KEGG module completeness matrix, please see KEGG module completeness for more details. You need to the consensus_annotation.tsv to run this!
  • verbose_kegg_matrix use this option to generate the KEGG module completeness matrix in verbose mode, please see KEGG module completeness for more details. You need to the consensus_annotation.tsv to run this!
  • default_workers use this to set the number of virtual workers used by Mantis. This is different from the physical . Default number of workers corresponds to the number of physical cores.
  • chunk_size use this to set the size of the chunks that your sample files will be divided in. 1000 sequences per chunk by default.
  • hmmer_threads use this to set the number of threads used by HMMER. 1 by default.
  • cores use this to set the number of physical cores used by Mantis. Mantis uses all available physical cores by default.
  • memory use this to set the amount of RAM used by Mantis (in GB). Mantis uses all available RAM by default.

Example

mantis run -i mantis/tests/test_sample.faa -od "Escherichia coli"

Annotate multiple samples:

mantis run -i target.tsv -o output_folder

Mandatory arguments: --input / -i
Optional arguments:  --output_folder / -o
                     --mantis_config / -mc
                     --evalue_threshold / -et
                     --overlap_value / -ov
                     --minimum_consensus_overlap / -mco
                     --domain_algorithm / -da
                     --time_limit / -tl
                     --keep_files / -k
                     --skip_consensus / -sk
                     --no_unifunc / -nuf
                     --no_consensus_expansion / -nce
                     --kegg_matrix / -km
                     --verbose_kegg_matrix / -vkm
                     --default_workers / -dw
                     --chunk_size / -cs
                     --hmmer_threads / -ht
                     --cores / -c
                     --memory / -m

The parameters when annotating multiple samples are very similar to when annotating one sample, what changes is simple the format of the input: The target tsv file should have the following format:

Query name Absolute sample path Organism details Genetic code
query_name_1 target_path_1 561
query_name_2 target_path_2 Proteobacteria
query_name_3 target_path_3
query_name_3 target_path_3 Clostridium_P perfringens
query_name_4 target_path_4 Escherichia coli 11

An example file is provided example_file.tsv. The query name and the sample path are mandatory. The organism details and genetic code columns are optional. The organism details is relevant if you want to use taxa specific databases to annotate your sample; here you can use the taxa name, NCBI or GTBTK IDs or GTBK lineages (e.g., d__Archaea;p__Halobacteriota;c__Methanosarcinia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanolobus;s__Methanolobus psychrophilus). The genetic code is only used if you are translating gene calls. By default, the genetic code 11 is used.

Other input formats

Mantis also accepts directory paths or compressed files (.gz,.zip,.tar.gz). Samples will be uncompressed and deleted after execution. Keep in mind that with this method it's not possible to input the taxonomical classification of each sample; for taxa-resolved annotations use the previous input methods.

Annotating metagenomes

Mantis scales well with Metagenomes, since it automatically splits fasta files into evenly sized chunks, ensuring parallelization without the potential idle time you'd get due to iterating over sample sequences or the differently sized HMM profiles references.

Homology search tools

Mantis supports the use of HMMs using HMMER and BLAST-like homology search using Diamond.