Skip to content

Configuration

Pedro Q edited this page Jan 13, 2022 · 16 revisions

Configuration

Installation

  1. conda install -c bioconda mantis_pfa
  2. mantis setup

Mantis comes with a MANTIS.cfg file which serves as the default configuration to all the users in the system. You can configure your own MANTIS.cfg file by copying this file and editing it as you wish. Afterwards you can just add -mc <path/to/edited_MANTIS.cfg>.

Setting your own paths

You might want to store your reference databases in another path besides the default References folder. If you want to keep all the reference databases in one specific folder change:

default_ref_folder=/path/to/mantis/ref/  

If you want to have each reference in specific folders, then change:

nog_ref_folder=/path/to/nog/  
ncbi_ref_folder=/path/to/ncbi/  
pfam_ref_folder=/path/to/pfam/  
kofam_ref_folder=/path/to/kofam/  
tcdb_ref_folder=/path/to/tcdb/  

If you don't want all the reference files to be used, you can change the path to 'NA', for example: nog_ref_folder=NA

Important: All of the default references belong to their respective authors, I haven't compiled any of this data, I'm merely distributing it in a more automated manner! Make sure you cite them when using this tool/their data.

Custom references

Custom references can be added in MANTIS.cfg by adding their absolute path or folder path, for example:

custom_ref=path/to/ref/custom1.hmm
custom_ref=path/to/ref/custom2.dmnd

Alternatively you may add them to the custom_refs folder, for example:

Mantis/References/Custom_references/custom1/custom1.hmm
Mantis/References/Custom_references/custom2/custom2.dmnd

You may also redifine the custom_refs folder path by adding your preferred path to custom_refs_folder in the MANTIS.cfg file, for example:

custom_refs_folder=path/to/custom_refs/

I have also compiled other reference datasets you may use with Mantis. These might not be suitable for all use-cases and so they are not included as default references. To access them please go to this url and follow the instructions on how to generate them. These are formatted to be directly plugged in with Mantis. You may of course create your own, please feel free to use the provided code to create your own references.

Adding custom HMMs

If custom HMMs are divided 1 hmm/hmm file make sure you merge them together (if they are from the same database). If HMMs from the same source are not merged, hits processing won't take into account potential hmm hits overlaps.

Adding custom DMNDs

When using a list of sequences as a reference please use Diamond to generate a .dmnd file.

Custom references metadata

Most metadata is formatted differently, therefore, for custom references it is required that metadata is formatted in a specific manner, otherwise only the hmm/sequence name is extracted as "metadata". To see an example please go to References/custom_refs/ where you will find two files custom.hmm and metadata.tsv.
In the metadata.tsv you can see how the metadata should be formatted. In the first column there should be the HMM/sequence name, in the columns that come after any kind of metadata can be added. To specify the type of metadata simply add the type before the actual annotation, .e.g., enzyme_ec:1.1.1.1. For the custom metadata to be recognized please place the custom metadata in the same folder as the custom reference file and name it metadata.tsv, for example: path/to/custom_ref/custom123.hmm and path/to/custom_ref/metadata.tsv.
The metadata tsv files should have the following format:

REF_ID_1 enzyme_ec:2.1.15.64 description:this is a description
REF_ID_2 enzyme_ec:3.2.9.13 kegg_ko:KO0002 description:this is a description

Make sure all the reference IDs (e.g., REF_ID_1) are unique!!!

Currently Mantis uses all these ID types:

  • kegg_map_lineage
  • kegg_ko
  • description
  • kegg_cazy
  • eggnog
  • go
  • cog
  • pfam
  • tcdb
  • enzyme_ec

Please make sure you use the same format when adding your custom metadata tsv. Other links are supported but may not be properly recognized during consensus generation.

Setting references weight

When generating the consensus, some references can be given more weight, this is important because some are more specific than others. To configure the weight of a reference simply change the MANTIS.cfg file:

  • example: nog_ref_folder should be nog_weight=X where X is the weight of the HMM (0-1)
  • example: custom_ref=path/to/customHMM.hmm should be customref_weight=X where X is the weight of the HMM (0-1)

To reiterate, to set the weight of a custom reference, simply add a line with the hmm file name followed by _weight , for example, if the file path is path/to/hmm/custom1.hmm, then you take the name of the file custom1 without the extension and set the weight like so: custom1_weight=0.5

In essence make sure the names of the weights correspond to the name of the references.
Default weight is 0.7.

Updating reference data

Reference data can be updated by simply deleting the old reference data folders (e.g., KOfam) and running setup, Mantis will then download the most recent data from the respective source.

Restricting eggNOG HMMs download

You still want to use the whole eggNOG compendium of HMMs (instead of the eggNOG diamond database), but only want to have HMMs for some taxa? You have the option of selecting only specific taxa by inserting a list of IDs or organism names in the MANTIS.cfg file line nog_tax=. If an organism name is introduced, an automatic web search retrieves the respective NCBI ID. A lineage for each NCBI ID is then generated and all the required TSHMMs are downloaded. The line nog_tax is commented by default.

Please keep in mind that this will also restrict the general eggNOG HMM. When downloading the full eggNOG compendium, the general eggNOG HMM will contain all non-redundant HMMs from 2157 (Archaea), 2 (Bacteria), 2759 (Eukaryota), 10239 (Viruses), 28384 (Others), and 12908 (Unclassified). However, when restricting the taxon with nog_tax, the general HMM will only contain the top-level HMMs from the selected taxa. For example, if using nog_tax=562, the general eggNOG HMM will only contain the HMMs from taxon 2 since the taxonomic lineage of the NCBI taxon 562 corresponds to 2 - 1224 - 1236 - 91347 - 543 - 561 - 562.

This will not affect the eggNOG diamond database or NPFM TSHMMs setup.

Sharing your conda environment

It's preferable to use a self-contained environment, avoiding compatibility issues. If you'd like to share your Mantis environment across multiple users do the following:

  1. Create the Mantis environment in a group folder location, by running conda env create -n mantis_env -p <path/to/group/folder/> Future Mantis users now need to do the following:
  2. Run conda config to generate the .condarc file
  3. Edit .condarc file (usually located in your root folder) and add:
envs_dirs:  
    - path/to/group/folder/  

Requirements

Software requirements

If using conda to run Mantis, these are the main packages Mantis requires:

  • Python, tested with v3.7.3 but anything above v3 should be fine
  • requests, tested with v2.22.0
  • numpy, tested with v1.18.1
  • nltk, tested with v3.4.4
  • psutil, tested with 5.6.7
  • HMMER, tested with v3.2.1
  • GCC, for compilation of cython code (most systems should have it by default)

These are all installed when you run conda install -c bioconda mantis_pfa. Regardless, for reproducibility, a conda environment recipe is also available - mantis_env.yml.

Mantis can only run on Linux or MacOS systems. For MacOS make sure you use python 3.7

Space requirements

Space requirements depend on the eggNOG database used. A diamond database (similar to the eggNOG-mapper diamond database) has been recently added which reduces space requirements by a lot (from 1.5T to 130G). This database is also lineage specific. If you would like to use the legacy eggNOG HMM database just set the config line nog_ref to hmm instead of dmnd in the MANTIS.cfg file: nog_ref=hmm # dmnd or hmm

The lineage annotation with eggNOG HMMs requires a lot of space since eggNOG's HMM database is quite extensive. For the taxonomy you will need around 1.5 terabytes. The rest of the HMMs only take up around 27 gibabytes. To check default datasets see Reference data
You don't need to use all of this data though!

Installation

Mantis is easy to setup, simply run:

conda install -c bioconda mantis_pfa
mantis setup

To check your installation run:

mantis check

Keep in mind the installation will take a while as a lot of data is downloaded. If NOG's HMMs are not used it can finish within a couple of hours (by default a NOG diamond database is generated), otherwise it may take a few days.
To customize your installation (setting installation paths or removing certain HMMs) please refer to configuration.