Skip to content
This repository has been archived by the owner on Jun 26, 2019. It is now read-only.

crickbabs/BABS-MNASeqPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BABS-MNASeqPE

Introduction

A Nextflow pipeline for processing paired-end Illumina MNASeq sequencing data.

The pipeline was written by The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.

Pipeline summary

  1. Raw read QC (FastQC, Fastq Screen)
  2. Adapter trimming (cutadapt)
  3. Alignment (BWA)
  4. Mark duplicates (picard)
  5. Filtering to remove:
    • reads that are marked as duplicates (SAMtools)
    • reads that arent marked as primary alignments (SAMtools)
    • reads that are unmapped (SAMtools)
    • reads that map to multiple locations (SAMtools)
    • reads containing > 3 mismatches in either read of the pair (BAMTools)
    • reads that have a user-defined insert size (BAMTools)
    • reads that are soft-clipped (BAMTools)
    • reads that map to different chromosomes (Pysam)
    • reads that arent in FR orientation (Pysam)
    • reads where only one read of the pair fails the above criteria (Pysam)
  6. Merge alignments at replicate-level (picard)
    • Re-mark duplicates (picard)
    • Remove duplicate reads (optional; SAMtools)
    • Create normalised bigWig files scaled to 1 million mapped read pairs (BEDTools, wigToBigWig)
  7. Call nucleosome positions and generate smoothed, normalised coverage wig files that can be used to generate occupancy profile plots between samples across features of interest (DANPOS2)
  8. Create IGV session file containing bigWig tracks for data visualisation (IGV)
  9. Collect and present QC at the raw read and alignment-level (MultiQC)

Documentation

The documentation for the pipeline can be found in the docs/ directory:

  1. Installation
  2. Pipeline configuration
  3. Reference genome
  4. Design file
  5. Running the pipeline
  6. Output and interpretation of results
  7. Troubleshooting

Pipeline DAG

BABS-MNASeqPE directed acyclic graph

Credits

The pipeline was written by the The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.

The pipeline was developed by Harshil Patel.

The NGI-RNAseq pipeline developed by Phil Ewels was used a template for this pipeline. Many thanks to Phil and the team at SciLifeLab.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.