Reproducible pipeline for the analysis of metabarcoding data generated by either Sanger or NGS approaches.
metaBEAT is using a number of external programs. To make your life easier we have created a self contained environment with all necessary pieces of software in a docker image. This image is building on ReproPhylo. If you want to use it you'll need Docker installed on your machine.
Run the metaBEAT script in the container (you can process data in you current working directory or subdirectories of it):
sudo docker run --rm --net=host --name metaBEAT -v $(pwd):/home/working chrishah/metabeat metaBEAT_global.py -h
In a terminal window, mount the docker container to your current working directory and enter the self contained environment using a shell:
sudo docker run -i -t --net=host --name metaBEAT -v $(pwd):/home/working chrishah/metabeat /bin/bash
Or access the container via a Jupyter notebook, by simply running the start_metaBEAT_nb
providing the full path to your desired mounting point to the script, e.g.:
./start_metaBEAT_nb $(pwd) --xt
This will open a Jupyter notebook in a new tab in your default browser. First it will notify you that your connection is not private. Click on Advanced
on the bottom left and proceed to local host (unsafe)
. Then you will be asked to provide a password, which is simply password
. Entering the password correctly will now open the Jupyter notebook and you are good to go.
Once you are done, you should stop the container by simply running:
stop_metaBEAT_nb
Within the environment you can then execute the scripts that come with metaBEAT, e.g.:
metaBEAT_global.py
Executing a script without any options will usually display the usage, e.g.:
usage: metaBEAT_global.py [-h] [-Q <FILE>] [-v] [-s] [-f] [-p] [-t] [-b]
[-m <string>] [-n <INT>] [-E] [-e] [--PCR_primer <FILE>]
[--trim_adapter <FILE>] [--trim_qual <INT>]
[--trim_window <INT>] [--trim_minlength <INT>] [--merge]
[--product_length <INT>] [--phred <INT>] [-R <FILE>]
[--gb_out <FILE>] [--rec_check] [--cluster]
[--clust_match <FLOAT>] [--clust_cov <INT>] [--www]
[--min_ident <FLOAT>] [--min_bit <INT>] [--refpkg <DIR>]
[-o OUTPUT_PREFIX] [--metadata METADATA] [--mock_meta_data]
[--version]
metaBEAT - metaBarcoding and Environmental DNA Analyses tool
optional arguments:
-h, --help show this help message and exit
-Q <FILE>, --querylist <FILE>
file containing a list of query files
-v, --verbose turn verbose output on
-s, --seqinfo write out seq_info.csv file
-f, --fasta write out ref.fasta file
-p, --phyloplace perform phylogenetic placement
-t, --taxids write out taxid.txt file
-b, --blast compile local blast db and blast queries
-m <string>, --marker <string>
marker ID (default: marker)
-n <INT>, --n_threads <INT>
Number of threads (default: 1)
-E, --extract_centroid_reads
extract centroid reads to files
-e, --extract_all_reads
extract reads to files
--version show program's version number and exit
Query preprocessing:
The parameters in this group affect how the query sequences are processed
--PCR_primer <FILE> PCR primers (provided in fasta file) to be clipped
from reads
--trim_adapter <FILE>
trim adapters provided in file
--trim_qual <INT> minimum phred quality score (default: 30)
--trim_window <INT> sliding window size (default: 5) for trimming; if
average quality drops below the specified minimum
quality all subsequent bases are removed from the
reads
--trim_minlength <INT>
minimum length of reads to be retained after trimming
(default: 50)
--merge attempt to merge paired-end reads
--product_length <INT>
estimated length of PCR product (default: 100)
--phred <INT> phred quality score offset - 33 or 64 (default: 33)
Reference:
The parameters in this group affect the reference to be used in the
analyses
-R <FILE>, --REFlist <FILE>
file containing a list of files to be used as
reference sequences
--gb_out <FILE> output the corrected gb file
--rec_check check records to be used as reference
Query clustering options:
The parameters in this group affect read clustering
--cluster perform clustering of query sequences using vsearch
--clust_match <FLOAT>
identity threshold for clustering in percent (default:
1)
--clust_cov <INT> minimum number of records in cluster (default: 1)
BLAST search:
The parameters in this group affect BLAST search and BLAST based taxonomic
assignment
--www perform online BLAST search against nt database
--min_ident <FLOAT> minimum identity threshold in percent (default: 0.95)
--min_bit <INT> minimum bitscore (default: 80)
Phylogenetic placement:
The parameters in this group affect phylogenetic placement
--refpkg <DIR> PATH to refpkg
BIOM OUTPUT:
The arguments in this groups affect the output in BIOM format
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
prefix for BIOM output files (default='metaBEAT')
--metadata METADATA comma delimited file containing metadata (optional)
--mock_meta_data add mock metadata to the samples in the BIOM output
VERSIONS
v. 0.6:
- docker image for this version is: chrishah/metabeat:v0.6
- used for Kitson et al. 2015