Skip to content

A Nextflow MinION-based pipeline for tracking species biodiversity

License

Notifications You must be signed in to change notification settings

MaestSi/ONTrack2

Repository files navigation

ONTrack2

A Nextflow MinION-based pipeline for tracking species biodiversity

ONTrack2 is a Nextflow implementation of ONTrack pipeline, a rapid and accurate MinION-based barcoding pipeline for tracking species biodiversity on site; starting from MinION sequence reads in fastq format, the ONTrack2 pipeline is able to provide accurate consensus sequences in ~10 minutes per sample on a standard laptop. Compared to the original version, polishing is now performed with Racon and Medaka.

drawing

Getting started

Prerequisites

  • Nextflow
  • Docker or Singularity
  • NCBI nt database (optional, in case you want to perform a local Blast analysis of your consensus sequences)

For downloading the database:

mkdir NCBI_nt_db
cd NCBI_nt_db
echo `date +%Y-%m-%d` > download_date.txt
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*
targz_files=$(find . | grep "\\.tar\\.gz$")
for f in $targz_files; do
  tar -xzvf $f;
  rm $f;
  rm $f".md5";
done

Installation

git clone https://github.com/MaestSi/ONTrack2.git
cd ONTrack2
chmod 755 *

Overview

drawing

Usage

The ONTrack2 pipeline requires you to open ONTrack2.conf configuration file and set the desired options. Then, you can run the pipeline using either docker or singularity environments just specifying a value for the -profile variable.

Usage:
nextflow -c ONTrack2.conf run ONTrack2.nf --fastq_files = "/path/to/files*.fastq" --scripts_dir = "/path/to/scripts_dir" --results_dir = "/path/to/results_dir" -profile docker

Mandatory argument:
-profile                                                              Configuration profile to use. Available: docker, singularity
Other mandatory arguments which may be specified in the ONTrack2.conf file
--fastq_files                                                         Path to fastq files, use wildcards to select multiple samples
--results_dir                                                         Path to a folder where to store results
--scripts_dir                                                         scripts_dir is the directory containing all scripts
--subsampling_flag                                                    subsampling_flag = true if you want to perform reads subsampling to reduce running time
--subsampled_reads                                                    subsampled_reads is the number of subsampled reads for each sample in case subsampling_flag = true
--minQ                                                                min Q value for reads filtering
--minLen                                                              min read length for reads filtering
--maxLen                                                              max read length for reads filtering
--target_reads_consensus                                              target_reads_consensus defines the maximum number of reads used for consensus calling
--target_reads_polishing                                              target_reads_polishing defines the maximum number of reads used for consensus polishing
--clustering_id_threshold                                             identity threshold for clustering preliminary allele assembly
--plurality                                                           cut-off for the number of positive matches in the multiple sequence alignment below which there is no consensus
--fast_alignment_flag                                                 set fast_alignment_flag=1 if you want to perform fast multiple sequence alignment; otherwise set fast_alignment_flag=0
--primers_length                                                      primers_length defines how many bases are trimmed from consensus sequences
--medaka_model                                                        medaka model for consensus polishing
--blast_db                                                            path to Blast-indexed database for Blasting consensus sequences

For running the analysis straight after live base-calling and demultiplexing in interactive mode, the helper script Run_ONTrack2.R is also available, which will perform concatenation of fastq files for each barcode, and run ONTrack2 pipeline on each file exploting Docker profile. The script should be executed with Rscript.

In case you wish to run ONTrack2 on a Windows laptop, you can install Ubuntu with WSL and run the pipeline following this tutorial.

Citation

This pipeline was designed and implemented by Prof. Massimo Delledonne and Simone Maestri.

If this tool is useful for your work, please consider citing our manuscript.

Maestri S, Cosentino E, Paterno M, Freitag H, Garces JM, Marcolungo L, Alfano M, Njunjić I, Schilthuizen M, Slik F, Menegon M, Rossato M, Delledonne M. A Rapid and Accurate MinION-Based Workflow for Tracking Species Biodiversity in the Field. Genes. 2019; 10(6):468.

For further information and insights into pipeline development, please have a look at my doctoral thesis.

Maestri, S (2021). Development of novel bioinformatic pipelines for MinION-based DNA barcoding (Doctoral thesis, Università degli Studi di Verona, Verona, Italy). Retrieved from https://iris.univr.it/retrieve/handle/11562/1042782/205364/.

Side notes

As a real-life Pokédex, the workflow described in our manuscript will facilitate tracking biodiversity in remote and biodiversity-rich areas. For instance, during a Taxon Expedition to Borneo, our analysis confirmed the novelty of a beetle species named after Leonardo DiCaprio.

About

A Nextflow MinION-based pipeline for tracking species biodiversity

Resources

License

Stars

Watchers

Forks

Packages

No packages published