Skip to content

Latest commit



94 lines (77 loc) · 4.2 KB

File metadata and controls

94 lines (77 loc) · 4.2 KB


B-assembler is a snakemake-based pipeline for assembling bacterial genomes from long reads (nanopore or pacbio) or hybrid reads (long and short reads)


As input, B-assembler takes one of the following:

  • A set of long reads (Nanopore or pacbio) from a bacterial isolate (uncorrected long reads are fine, though corrected long reads should work too)
  • Illumina reads from a bacterial isolate (required paired-end reads) and long reads from the same isolate (best case)

Reasons to use B-assembler:

  • It is a best-practice for long reads bioinformatics into a (hopefully) easy-to-use pipeline, taking advantage of all the goodness of Snakemake while adding a few features; including:
    • a text-based read config that allows automated simple read pre-processing (select reads based on read length)
    • a text-based run config that provides a trivial way to define assembly and polishing strategies (long read only or hybrid read mode)
    • automatic generation of assembly
  • It circularises genome without the need for a separate tool like Circlator.
  • It can use long reads or hybrid reads in hybrid assembly.
  • It has very low misassembly rates.
  • It's easy to use: runs with just one command and usually doesn't require tinkering with parameters.
  • It is fast. Running time just need few hours.

Reasons to not use B-assembler:

  • You're assembling a eukaryotic genome or a metagenome (Pipeline is designed exclusively for bacterial isolates).
  • Your long reads are low depth (<50).
  • Your Illumina reads and long reads are from different isolates.
  • only Illumina reads.



Install from source

git clone; cd B-assembler;

setup the environment

conda env create -n B-assembler -f env.yaml
conda activate B-assembler

Note It is important that you ensure all bioconda installed tools installed.

Write your configuation

###Provide sequence data in config.yaml file

vi config.yaml

Replace the YAML keys as appropriate. Keys are:

Key Type Description
longread Path to Nanopore or Pacbio reads It requires path to your long reads, pull all your reads into one fastq file, it is recommended to provide absolute path, you can ignore this if you do not have nanopore reads
Illumina R1 path to Illumina R1 Read1 of paired-end Illumina reads, it is recommended to provide absolute path, you can ignore this if you do not have Illumina reads
Illumina R2 path to Illumina R2 Read2 of paired-end Illumina reads, it is recommended to provide absolute path, you can ignore this if you do not have Illumina reads
genomesize int number of base pair of extimated genomesize of your species
readtype ONT or pb Type of your long reads, ONT is for nanopore, pb is for pacbio

Engage the pipeline

Run the pipeline, you must specify cores to ensure that how many threads you give. ##Usage

Usage: bash <numCPUs> <LongReadOnly|Hybrid> [output:PWD]

Require arguments:
numCPUs: int
         threads provided for pipeline

         assembly mode for your reads, type "LongReadOnly" or "Hybrid" based on your data

Optional argument:
	 output directory, current working directory by default


bash 2 LongReadOnly


The final assembly will be in the output directory, and the name of assembly: B-assembler.fasta