MetaT is a wrapper tool for fast and simple preprocessing of NGS transcriptomics data, combining the most recent and fastest software currently available in a user-friendly format. The general workflow of the MetaT wrapper tool is:
# QC of reads.
MetaT readprep -i samples -t 10
# Annotate assembly.
MetaT annotation -a assembly.fasta -o annotation.fasta -t 10
# Map reads to annotated assembly.
MetaT mapping -i RNAreads -a anotation.fasta -o counts.txt -t 10
Importantly, the pipeline is only setup to handle transcriptomics data from a 50bp SR illumina library. Furhtermore, the annotation database are only setup for Bacteria at the moment.
The source code for MetaT can be aquired by:
git clone https://github.com/TYMichaelsen/MetaT
Prepare RNA reads for mapping, by performing adapter trimming, Q-score filtering, and rRNA removal of 50bp SR.
MetaT readprep [-h] [-d dir -i file -o dir -q value -t value]
Arguments:
-h Show this help text.
-d Directory to search for raw Illumina SR sequencing data.
-i List of prefixes for files to search for.
-o Output directory to put QC'ed reads. Defaults to 'RNAreads' in cd.
-q Q-score threshold. Default: 20.
-t Number of threads. Defaults: 10.
Output:
- .fasta files of curated reads in -o directory.
- A file 'seqstat.txt' containing count statistics of reads during each step. Dumped in cd.
- A file 'rRNAreads.fa' containing all rRNA reads found. Dumped in cd.
Note:
The -i option relies on the typical naming convention of demultiplexed Illumina output files, meaning that the prefix is consistent and unique for all files (read no., lane) for a particular sample. Make sure this is the case. The code will concatenate all files with same prefix before downstream processing.
Requirements:
- BBMap
Details:
The readprep
function utilizes BBmap
tools to perform the actual adapter trimming, Q-score filtering and rRNA removal.
The BBMap
tool has build-in adapters which covers basically all NGS adapters.
The SILVA Life Tree Project (LTP)
database version 132 is used as reference for rRNA removal.
MetaT annotation [-h] [-a file -g file -d dir -o file -t value]
Arguments:
-h Show this help text.
-a Assembly to be annotated.
-g Genome(s) to be annotated. Matching reads in the assembly (if provided) are filtered away before annotation.
-d Folder, containing .gbff files for custom database.
-o Output file. Defaults to 'annotation.fasta'.
-t Number of threads. Default: 10.
Output:
A fasta file with header as follows;
ID contig|ftype|EC_number|gene|product|locus_tag|function|inference
Requirements:
- prokka
Details:
The annotation
function utilizes prokka
to perform the search for ORFs and annotation.
MetaT mapping [-h] [-i dir -a file -o file -x value -t value]
Arguments:
-h Show this help text.
-i Input folder, containing .fasta files for mapping. must be in cd or subfolders.
-a Annotation .fasta file as outputted from 'annotation' function in MetaT.
-o Output file. Defaults to 'counts.txt' in cd.
-x Identity threshold for mapping (default: 0.95).
-t Number of threads (default: 10).
Output:
The output is a tab-separated file. Columns are samples and rows are genes. First column is the unique gene ID. Second column is metadata, each metadata is separated by '|'.
Requirements:
- minimap2
- R
- data.table (An R-package)
Details:
The mapping
function utilizes minimap2 for
mapping reads to reference and the R-package data.table
to reshape the data.
If you encounter bugs or have further questions or requests, you can raise an issue at the issue page.
As MetaT still is in development, no citing is available.
MetaT is developed for a very specific purpose:
- Mapping 50bp SR illumina reads to annotated microbial metagenomes.
All other usecases you're on your own!