Skip to content

Jwindler/Assembly_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assembly analysis tools and papers

GitHub Repo stars GitHub pull requests GitHub License

Genome Assembly tools are added by pipeline. Welcome contribute and get in touch!

Google Group | Volunteers | CONTRIBUTING

If there is an error in cited papers or tool does not included in list, please raise an ISSUE.

Table of content

Survery

Name Introduction Paper Url Note Public Date
GenomeScope Fast genome analysis from unassembled short reads. Bioinformatics Github 2017.3
smudgeplot Such an approach also allows us to analyze obscure genomes with duplications, various ploidy levels, etc. Nature Communications Github GenomeScope 2.0 2020.3
Jellyfish Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. Bioinformatics Github 2011.1
nQuire A statistical framework for ploidy estimation using NGS short-read data. BMC Bioinformatics Github 2018.4
KMC Counting and manipulating k-mer statistics. Bioinformatics Github 2017.5
KAT a K-mer analysis toolkit to quality control NGS datasets and genome assemblies Bioinformatics Github 2016.11

Contig

Name Introduction Paper Url Note Public Date
Hifiasm Hifiasm is a fast haplotype-resolved de novo assembler initially designed for PacBio HiFi reads. Nat Methods Github 2021.2
HiCanu designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). Genome Research Github 2020.8
NextDenovo NextDenovo is a string graph-based de novo assembler for long reads (CLR, HiFi and ONT). bioRxiv Github 2023.3
IPA Github
Flye De novo assembler for single molecule sequencing reads using repeat graphs. Nature Methods Github 2020.10
Peregrine Peregrine is a fast genome assembler for accurate long reads (length > 10kb, accuracy > 99%). bioRxiv Github 2019.7
HGAP4 HGAP4 is suitable for assembling a wide range of genome sizes and complexity. Nature Methods PacBio 2013.5
Wtdbg2 A fuzzy Bruijn graph approach to long noisy reads assembly. Nature Methods Github 2019.12
Falcon a set of tools for fast aligning long reads for consensus and assembly. Nature Methods Github 2016.10
SMARTdenovo Ultra-fast de novo assembler using long noisy reads. Gigabyte Github 2021.3
miniasm Ultrafast de novo assembly for long noisy reads (though having no consensus step) Bioinformatics Github 2016.6
necat Nanopore data assembler Nature Communications Github 2021.1
Hypo-Assembler A diploid genome polisher and assembler. Nature Methods Github 2024.3
Verkko a hybrid genome assembly pipeline developed for T2T assembly of HiFi or ONT reads. Nature Biotechnology Github 2023.2
NextPolish2 Repeat-aware polishing genomes assembled using HiFi long reads. GPB Github 2024.1
Merfin Evaluate variant calls and its combination with k-mer multiplicity. Nature Methods Github 2022.3
SOAPdenovo2 Next generation sequencing reads de novo assembler. Bioinformatics Github 2015.1
Canu A single molecule sequence assembler for genomes large and small. Genome Research Github 2017.5
MECAT2 Nature Methods Github 2017.9

Scaffold

Name Introduction Paper Url Note Public Date
3D-DNA Scaffold genome with Hi-C data. Science Github Use Hi-C data 2017.3
LACHESIS Use Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. Nature Biotechnology Github LACHESIS is no longer being actively developed. 2013.12
SALSA2 A tool to scaffold long read assemblies with Hi-C data. bioRxiv Github 2018.2
YaHS Yet another Hi-C scaffolding tool. Bioinformatics Github recommend 2022.12
instaGRAAL Large genome reassembly based on Hi-C data, continuation of GRAAL. Nature Communications Github NVIDIA graphics card is required 2014.12
EndHiC a fast and easy-to-use Hi-C scaffolding tool. Quantitative Biology Github 2021.11
Pin_hic A Hi-C scaffolding method. BMC Bioinformatics Github 2021.11
AutoHiC A novel genome assembly pipeline based on deep learning. bioRxiv Github recommend (Deep Learning) 2023.8
ALLHiC phasing and scaffolding polyploid genomes based on Hi-C data. Nature Plants Github recommend (Plant) 2019.8
Juicebox a point-and-click interface for using Hi-C heatmaps to identify and correct errors in a genome assembly. bioRxiv Github 2018.1
SLR Scaffolding using long reads obtained by the third generation sequencing technologies. BMC Bioinformatics Github 2019.10
LongStitch Correct and scaffold assemblies using long reads. BMC Bioinformatics Github 2021.10
RagTag a collection of software tools for scaffolding and improving modern genome assemblies. Genome Biology Github 2022.12
HapHiC a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data. bioRxiv Github 2023.11
scaffhic Pipeline for genome scaffolding by modelling distributions of HiC pairs. Github
HiCAssembler Software to assemble contigs/scaffolds into chromosomes using Hi-C data. Genes & Dev Github 2019.10
HaploHiC comprehensive haplotype division of Hi-C PE-reads based on local contacts ratio. Github
DipAsm Efficient chromosome-scale haplotype-resolved assembly of human genomes. bioRxiv Github 2020.7

Polish

Name Introduction Paper Url Note Public Date
YAGcloser Yet-Another-Gap-Closer based on spanning of long reads. Journal of Heredity Github 2022.5
TGS-Gapcloser A gap-closing software tool that uses long reads to enhance genome assembly. GigaScience Github 2020.9
DENTIST Close assembly gaps using long-reads at high accuracy. GigaScience Github 2022.1
Redundans a pipeline that assists an assembly of heterozygous/polymorphic genomes. Nucleic Acids Research Github 2016.4
Purge Haplotigs an effective tool for the early stages of curating highly heterozygous genome assemblies produced from third-generation long read sequencing. BMC Bioinformatics Bitbucket 2018.11
Purge_dups haplotypic duplication identification tool. Bioinformatics Github 2020.1
Pilon an automated genome assembly improvement and variant detection tool. PLOS ONE Github 2014.11
Racon Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. Genome Research Github 2017.1
nextpolish Fast and accurately polish the genome generated by long reads. Bioinformatics Github 2020.4
HaploMerger2 Genome Research Github 2012.5
GapFiller Horticulture Research Github 2023.10
RegCloser BMC Bioinformatics Github 2023.6

Evaluation

Genome assembly evaluation tools.

Name Introduction Paper Url Note Public Date
QUAST a quality assessment tool for evaluating and comparing genome assemblies. Bioinformatics Github 2013.2
BioNanoAnalyst a visualisation tool to assess genome assembly quality using BioNano data. BMC Bioinformatics Github Use BioNano data 2017.6
CRAQ Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement. Nature Communications Github Single base scale 2023.10
BUSCO assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics BUSCO 2015.6
Merqury k-mer based assembly evaluation Genome Biology Github 2020.9
Klumpy A Tool to Evaluate the Integrity of Long-Read Genome Assemblies and Illusive Sequence Motifs. bioRxiv Bitbucket
GAEP a comprehensive genome assembly evaluating pipeline. JGG Github 2023.5
Flagger Evaluating genome assemblies. Nature Github 2023.5
Asset assembly evaluation tool. Github
Inspector A tool for evaluating long-read de novo assembly results. Genome Biology Github 2021.11

About

A collection of tools for Genome Assembly analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published