Skip to content

ChongLab/B-assembler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

B-assembler

B-assembler is a snakemake-based pipeline for assembling bacterial genomes from long reads (nanopore or pacbio) or hybrid reads (long and short reads)

Introduction

As input, B-assembler takes one of the following:

  • A set of long reads (Nanopore or pacbio) from a bacterial isolate (uncorrected long reads are fine, though corrected long reads should work too)
  • Illumina reads from a bacterial isolate (required paired-end reads) and long reads from the same isolate (best case)

Reasons to use B-assembler:

  • It is a best-practice for long reads bioinformatics into a (hopefully) easy-to-use pipeline, taking advantage of all the goodness of Snakemake while adding a few features; including:
    • a text-based read config that allows automated simple read pre-processing (select reads based on read length)
    • a text-based run config that provides a trivial way to define assembly and polishing strategies (long read only or hybrid read mode)
    • automatic generation of assembly
  • It circularises genome without the need for a separate tool like Circlator.
  • It can use long reads or hybrid reads in hybrid assembly.
  • It has very low misassembly rates.
  • It's easy to use: runs with just one command and usually doesn't require tinkering with parameters.
  • It is fast. Running time just need few hours.

Reasons to not use B-assembler:

  • You're assembling a eukaryotic genome or a metagenome (Pipeline is designed exclusively for bacterial isolates).
  • Your long reads are low depth (<50).
  • Your Illumina reads and long reads are from different isolates.
  • only Illumina reads.

Requirements

Installation

Install from source

git clone https://github.com/huang1990/B-assembler.git; cd B-assembler;

setup the environment

conda env create -n B-assembler -f env.yaml
conda activate B-assembler

Note It is important that you ensure all bioconda installed tools installed.

Write your configuation

###Provide sequence data in config.yaml file

vi config.yaml

Replace the YAML keys as appropriate. Keys are:

Key Type Description
longread Path to Nanopore or Pacbio reads It requires path to your long reads, pull all your reads into one fastq file, it is recommended to provide absolute path, you can ignore this if you do not have nanopore reads
Illumina R1 path to Illumina R1 Read1 of paired-end Illumina reads, it is recommended to provide absolute path, you can ignore this if you do not have Illumina reads
Illumina R2 path to Illumina R2 Read2 of paired-end Illumina reads, it is recommended to provide absolute path, you can ignore this if you do not have Illumina reads
genomesize int number of base pair of extimated genomesize of your species
readtype ONT or pb Type of your long reads, ONT is for nanopore, pb is for pacbio

Engage the pipeline

Run the pipeline, you must specify cores to ensure that how many threads you give. ##Usage

Usage: bash run_B-assembler.sh <numCPUs> <LongReadOnly|Hybrid> [output:PWD]

Require arguments:
numCPUs: int
         threads provided for pipeline

LongReadOnly|Hybrid
         assembly mode for your reads, type "LongReadOnly" or "Hybrid" based on your data

Optional argument:
output:
	 output directory, current working directory by default

##Examples

bash run_B-assembler.sh 2 LongReadOnly

output

The final assembly will be in the output directory, and the name of assembly: B-assembler.fasta

Releases

No releases published

Packages

 
 
 

Languages

  • Python 98.2%
  • Shell 1.8%