Skip to content

Wrapper tools for (meta)transcriptomics data generation

License

Notifications You must be signed in to change notification settings

TYMichaelsen/MetaT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

Users' Guide

MetaT is a wrapper tool for fast and simple preprocessing of NGS transcriptomics data, combining the most recent and fastest software currently available in a user-friendly format. The general workflow of the MetaT wrapper tool is:

# QC of reads.
MetaT readprep -i samples -t 10

# Annotate assembly.
MetaT annotation -a assembly.fasta -o annotation.fasta -t 10

# Map reads to annotated assembly.
MetaT mapping -i RNAreads -a anotation.fasta -o counts.txt -t 10

Importantly, the pipeline is only setup to handle transcriptomics data from a 50bp SR illumina library. Furhtermore, the annotation database are only setup for Bacteria at the moment.

Installation

The source code for MetaT can be aquired by:

git clone https://github.com/TYMichaelsen/MetaT

Functions

readprep

Prepare RNA reads for mapping, by performing adapter trimming, Q-score filtering, and rRNA removal of 50bp SR.

MetaT readprep [-h] [-d dir -i file -o dir -q value -t value]

Arguments:

-h  Show this help text.
-d  Directory to search for raw Illumina SR sequencing data.
-i  List of prefixes for files to search for.
-o  Output directory to put QC'ed reads. Defaults to 'RNAreads' in cd.
-q  Q-score threshold. Default: 20.
-t  Number of threads. Defaults: 10.

Output:

  1. .fasta files of curated reads in -o directory.
  2. A file 'seqstat.txt' containing count statistics of reads during each step. Dumped in cd.
  3. A file 'rRNAreads.fa' containing all rRNA reads found. Dumped in cd.

Note:

The -i option relies on the typical naming convention of demultiplexed Illumina output files, meaning that the prefix is consistent and unique for all files (read no., lane) for a particular sample. Make sure this is the case. The code will concatenate all files with same prefix before downstream processing.

Requirements:

  • BBMap

Details:

The readprep function utilizes BBmap tools to perform the actual adapter trimming, Q-score filtering and rRNA removal. The BBMap tool has build-in adapters which covers basically all NGS adapters. The SILVA Life Tree Project (LTP) database version 132 is used as reference for rRNA removal.

annotation

MetaT annotation [-h] [-a file -g file -d dir -o file -t value]

Arguments:

-h  Show this help text.
-a  Assembly to be annotated.
-g  Genome(s) to be annotated. Matching reads in the assembly (if provided) are filtered away before annotation. 
-d  Folder, containing .gbff files for custom database.
-o  Output file. Defaults to 'annotation.fasta'.
-t  Number of threads. Default: 10.

Output:

A fasta file with header as follows;
ID contig|ftype|EC_number|gene|product|locus_tag|function|inference

Requirements:

  • prokka

Details:

The annotation function utilizes prokka to perform the search for ORFs and annotation.

mapping

MetaT mapping [-h] [-i dir -a file -o file -x value -t value]

Arguments:

-h  Show this help text.
-i  Input folder, containing .fasta files for mapping. must be in cd or subfolders. 
-a  Annotation .fasta file as outputted from 'annotation' function in MetaT.
-o  Output file. Defaults to 'counts.txt' in cd.
-x  Identity threshold for mapping (default: 0.95).
-t  Number of threads (default: 10).

Output:

The output is a tab-separated file. Columns are samples and rows are genes. First column is the unique gene ID. Second column is metadata, each metadata is separated by '|'.

Requirements:

  • minimap2
  • R
  • data.table (An R-package)

Details:

The mapping function utilizes minimap2 for mapping reads to reference and the R-package data.table to reshape the data.

Getting help

If you encounter bugs or have further questions or requests, you can raise an issue at the issue page.

Citing MetaT

As MetaT still is in development, no citing is available.

Limitations

MetaT is developed for a very specific purpose:

  • Mapping 50bp SR illumina reads to annotated microbial metagenomes.

All other usecases you're on your own!

About

Wrapper tools for (meta)transcriptomics data generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages