This pipeline maps long DNA sequence reads to a reference genome, and evaluates the performance of a Cas9 based target enrichment strategy. The workflow is suitable for Oxford Nanopore fastq sequence collections and requires a reference genome and a BED file of target coordinates. The program is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and results highly reproducible.
The current workflow consists of:
- Mapping of the reads onto a reference genome
- Handling of generated sam files and transformation into bam files
- Evaluation of the performance of the enrichment
- Separation of reads into different files according to their mapping status
The pipeline comes with documentation, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output
This pipeline was written by Tristan Kast (tristankast) at DZNE, using R scripts from the nanoporetech ont_tutorial_cas9 github repo (https://github.com/nanoporetech/ont_tutorial_cas9).