Skip to content
Kevin Menden edited this page Apr 30, 2018 · 3 revisions

Comparison of hybrid-assemblers

While this pipeline was written, the performance of the different implemented assembly tools was evaluated on different datasets. This was mainly done with a low- to medium-coverage scenario in mind (8-10x Nanopore, 20-40x Illumina), but will be extended for other scenarios as well.

The choice which assembler to use depends also on the type of genome to be analyzed. Some tools might perform better on microbial genomes while others will perform better on large eukaryotic genomes.

All statistics were calculated using QUAST. The assemblies were created using the tools from this pipeline.

Low to medium coverage

The following assemblies were made during the test-phase of the pipeline, because we wanted to now which assembler would perform best with the following coverages:

  • Short-reads: 20-40x
  • Long-reads: 5-10x

Because it was planned to assemble the genomes of multiple samples from several species (mammals, human), it was not feasible to create larger coverages. For this test, only Nanopore long-reads were used. Data from different species was downloaded and subsampled to 10x long-reads and 25x short-reads.

E. coli K-12

Metric MaSuRCA SPAdes Canu
Genome statistics
Genome fraction (%) 95.593 98.883 -
Duplication ratio 1.01 1.001 -
Largest alignment 350,299 539,289 -
Total aligned length 4,478,886 4,590,963 -
NG50 125,891 236,291 -
NA50 116,095 193,949 -
NA50 130,492 236,291 -
Missassemblies 6 2 -
Scaffolds 55 39 -
Runtime 2m 37s 7m 95s
Memory 2.7 GB 5.0 GB

Candida

C. elegans

Clone this wiki locally