MINTIA - Metagenomic INserT BIoinformatic Annotation

Description

Functional metagenomics is used to understand who is doing what in microbial ecosystems. DNA sequencing can be prioritized by activity-based screening of libraries obtained by cloning and expressing metagenomic DNA fragments in an heterologous host. When large insert libraries are used, allowing a direct access to the functions encoded by entire metagenomic loci sizing several dozens of kbp, NGS is required to identify the genes that are responsible for the screened function. MINTIA is an easy to use pipeline assembling and annotating metagenomic inserts.

The assembly module (assemble) assembles, cleans and extracts the longest and most covered contigs for each DNA insert. It handles reads (454, ion torrent,...) or read pairs (Illumina,...). This tools is not able to process PacBio or Oxford Nanopore reads. It sub-selects by default 300X read coverage depending on the sequencing platform and assembles them, removes cloning vector and selects best contigs. Only contigs with length and average depth over given thresholds are kept. The produced HTML report includes a dynamic graphic with contig length and coverage for each sample.

The annotation module (annotate) aims at obtaining main gene functions and a functional classification. The pipeline launches at least prokka for ORF detection, generating fasta files for genes and proteins as well as a tabular file containing ORFs description. Depending on additional options selected, contigs and ORFs are aligned against NCBI NR (Non Redundant) as well as SP (SwissProt) and COGss databases. The produced HTML report includes all results and allow to explore annotations based on igv.js an embeddable interactive genome visualization component.

Installation

This MINITA repository is for command line user.
For an easy install, conda environment is recommended. To install Miniconda, follow this tutorial or, to install Ananconda, follow this one.

Install

Clone this repository:

$ git clone --recursive https://github.com/Bios4Biol/MINTIA.git

Use conda to install the third party software:

$ cd MINTIA
$ ./setup.sh

Two dependencies will not be installed by conda and must be installed "manually":

cross_match required for the assemble module (step1): cross_match
MEGAN5 (optional) for the annotate module (step2):
- xvfb-run
- megan5

Activate the mintia environment:

$ conda activate mintia

Tools dependencies can be checked:

$ mintia check
##############################################
        Mintia_v1.0 check dependencies
##############################################

- Step 1 - assemble:
  => cutadpt..........ok...version:2.10
  => spades...........ok...version:v3.14.1
  => cross_match......ok...version:1.090518

- Step 2 - annotate:
  => prokka...........ok...version:1.14.6
  => diamond..........ok...version:0.9.29
  => xvfb-run.........ok
  => MEGAN............ok...version:5.11.3
  => rpsblast.........ok...version:2.9.0
  => samtools.........ok...version:1.9
  => tabix............ok...version:1.9

Tools dependencies

Tools	Tested version	Tools	Tested version
`cross_match`	`1.090518`	`bgzip`
`spades`	`v3.14.1`	`file`	`file-5.04`
`prokka`	`1.14.6`	`grep`	`GNU grep 2.6.3`
`diamond`	`0.9.29`	`which`
`megan`	`5.11.3`	`xvfb-run`
`samtools`	`1.9`	`tabix`	`1.9`
`rpsblast`	`2.9.0`
`cutadapt`	`2.10`

Databanks

Reference databases are required for the "annotate" module.

NR: non-redundant protein database indexed for Diamond (used by -F and -M see below)
Uniprot/Swissprot: protein database indexed for Diamond (used by -F see below)
COGs: database of Clusters of Orthologous Groups of proteins, input of rpsblast (used by -C see below)

Create COGs DB

$ wget ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd//cdd.tar.gz
$ tar -xvzf cdd.tar.gz
$ makeprofiledb -title COG.3-28-17 -in Cog.pn -out Cog.v3-28-17 -threshold 9.82 -scale 100.0 -dbtype rps -index true

Run MINTIA

Activate the mintia environment

$ conda activate mintia

Check tools dependencies

$ mintia check -h
Name:
     mintia - Fosmid assembly and annotation pipeline.

Check Synopsis:
     mintia check

Check Options:
    -h, --help
             Print help

Assemble

$ mintia assemble -h
Name:
     mintia - Fosmid assembly and annotation pipeline.

Assemble Synopsis:
     mintia assemble -i FASTQ_FILE[S] -v FASTA_FILE -d STR

Assemble Options:
    -i, --input FILE[S]
             Fastq(.gz) file(s)
             For each sample one OR two fastq file must be provided:
             - Paired data must contain R1/R2: R[12].f[ast]q[.gz]
               Ex: sampleName1_R1.fastq sampleName1_R2.fastq
                   sampleName2_R1.fq.gz sampleName2_R2.fq.gz
             - Single data
               Ex: sampleName.f[ast]q[.gz]

    -v, --vectorSeq FILE
             Vector fasta file

    --length INT
             Fosmid's expected length [40000]

    --minimalContigLength INT
             Contig's minimum length [1000]

    --minimalContigDepth INT
             Contig's minimum depth [8]

    -c, --maxDepth INT
             Coverage, maximum depth use to filter input reads [300]

    -d, --dirOutputs STR
             Path to the outputs directory

    -H, --htmlOutput STR
             HTML output name [mintia_assemble.html]

    -L, --logOutput STR
             Log output file name [mintia_assemble.log]

    -t, --threads
             number of threads for SPADES [8]

    -h, --help
             Print help

Example based on test data:

mintia assemble -t 1 -i Data/Input/Assemble/BifidoAdolescentis.s*gz -v Data/Input/Assemble/pCC1FOS.fasta -len 40000 -c 300 -d Data/Output/Assemble/

Annotate

$ mintia annotate -h
Name:
     mintia - Fosmid assembly and annotation pipeline.

Annotate Synopsis:
     mintia annotate -i FASTA_FILE -n NR_DMND_FILE -u UNIPROT_DMND_FILE -F -d STR

Annotate Options:
    -i, --input FILE
             Fasta(.gz) file

    -s, --separator CHAR [#]
             Which separator allows retreiving the fosmid name
             This separator will be also use to create ORF id
             Example: >fosmidName1#contig1... |
                      >fosmidName1#contig2... | => fosmidName1
                      >fosmidName1#contig3... |

    -n, --nrDB FILE
             Non-redundant proteins database indexed for Diamond (Ex: the nr.dmnd)
             Required by -F.

    -u, --uniprotDB FILE
             Proteic sequence database indexed for Diamond (Ex: the uniprot_sprot.dmnd)
             Required by -F and -M.

    -F, --FunctionalAndTaxonomic
             Run functional and taxonomic annotations
             -n, --nrDB and -u, --uniprotDB must be provided

    -e, --evalue FLOAT
             Maximum diamond e-value to report alignments [10e-8]

    -q, --queryCover INT
             Minimum diamond query cover% to report an alignment [50]

    -M, --Megan FILE
             Run MEGAN - A license file must be provided
             -n, --nrDB must be provided

    -C, --Cog FILE
             Run annotations with COGs, DB COGs path file

    -c, --cMaxEvalue FLOAT
             Max Evalue for Rps-Blast COGs filtering [10e-8]

    -S, --SubmissionFiles
             Build submission files

    -D, --DiamondAgainstPrivateDB FILE
             Run diamond against your own protein reference FASTA file

    -t, --threads INT
             Number of threads for Blast [8]

    -d, --dirOutputs STR
             Path to the outputs directory

    -H, --htmlOutput STR
             HTML output name [mintia_annotate.html]

    -L, --logOutput STR
             Log output file name [mintia_annotate.log]

    -k, --keepTmpFiles
             Keep temporary files

    -h, --help
             Print help

Example based on test data:

# Without Megan and COG annotations
mintia annotate -F -i Data/Output/Assemble/mintia_assemble.fasta -d  Data/Output/Annotate2/ -t 1 -S -n Data/Input/Annotate/NR_subset4test.dmnd -u Data/Input/Annotate/UNIPROT-SP_subset4test.dmnd

# With Megan and COG annotations
mintia annotate -F -C PATH_TO_COG_DB/Cog.v3-28-17 -M PATH_TO_YOUR_LICENSE/MEGAN5-academic-license.txt - Data/Output/Assemble/mintia_assemble.fasta -d  Data/Output/Annotate2/ -t 1 -S -n Data/Input/Annotate/NR_subset4test.dmnd -u Data/Input/Annotate/UNIPROT-SP_subset4test.dmnd

License

GNU GPL v3

Copyright

2020 INRAE

Contact

support.sigenae@inra.fr

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
Data		Data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
mintia		mintia
mintia.pl		mintia.pl
mintia_annotate_pipeline.png		mintia_annotate_pipeline.png
mintia_assemble_pipeline.png		mintia_assemble_pipeline.png
set_env_config.sh		set_env_config.sh
setup.sh		setup.sh
unset_env_config.sh		unset_env_config.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MINTIA - Metagenomic INserT BIoinformatic Annotation

Description

Table of content

Installation

Install

Tools dependencies

Databanks

Run MINTIA

Activate the mintia environment

Check tools dependencies

Assemble

Annotate

License

Copyright

Contact

About

Releases

Packages

Contributors 2

Languages

License

Bios4Biol/MINTIA

Folders and files

Latest commit

History

Repository files navigation

MINTIA - Metagenomic INserT BIoinformatic Annotation

Description

Table of content

Installation

Install

Tools dependencies

Databanks

Run MINTIA

Activate the mintia environment

Check tools dependencies

Assemble

Annotate

License

Copyright

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages