Skip to content

Mash-sketched reference databases are an up-to-date representation of several NCBI genomic, proteomic and metagenomic data, curated and formatted according to the sketching MinHash algorithm developed by Ondov et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. Jun 20;17(1):132. doi: 10.1186/s13059-016-0…

License

Notifications You must be signed in to change notification settings

ayixon/Mash-sketched-reference-databases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Mash-sketched-reference-databases

Mash-sketched reference databases are an up-to-date representation of several NCBI genomic, proteomic and metagenomic data, curated and formatted according to the sketching MinHash algorithm developed by Ondov et al. 2016. Each dataset is in .msh format and mostly represents information from type material, in order to offer informative contexts with standard in nomenclature when possible.

TemplateGithub

You can access and download all pre sketched files from:

Sánchez-Reyes, Ayixon; Fernández-López, Maikel Gilberto (2021): Mash sketched databases for: Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics. figshare. Online resource

Data information:

°Bacteria_Archaea_type_assembly_set.msh => 16,304 type genomes from NCBI

°Bacteria_Archaea_type_proteome_set.msh => 12,767 type predicted proteomes from NCBI

°GTDB_r202_assembly_set.msh => 31,910 genomes from GTDB

°Fungi_type_assembly_set.msh => 753 type genomes from NCBI

°Fungi_type_proteome_set.msh => 248 type predicted proteomes from NCBI

°Virus_Sept21_GenBank_assembly_set.msh => 40,708 viral assemblies from NCBI

°Soil_Metgenome_assembly_set.msh => 479 soil metagenomes from NCBI

°Freshwater_Metagenome_assembly_set.msh => 611 freswater metagenomes from NCBI

°Fungal_Database.2022_genomic.fna.gz.msh => 4,293 filamentous and yeast-like fungal genomes

Usage

You can use any of the files with mash tool (see the work by Ondov et al., 2016 About Mash)

Example

$   mash dist query_genome    Bacteria_Archaea_type_assembly_set.msh > output

If you find this information useful for your work, please cite:

Sánchez-Reyes, A., & Fernández-López, M. G.. (2024). Sketched reference databases for genome-based taxonomy and comparative genomics. Brazilian Journal of Biology, 84(Braz. J. Biol., 2024 84), e256673. https://doi.org/10.1590/1519-6984.256673

Mash sketched databases for: Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics. figshare. Online resource. https://doi.org/10.6084/m9.figshare.14408801.v6

Ondov et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. Jun 20;17(1):132. https://doi.org/10.1186/s13059-016-0997-x

About

Mash-sketched reference databases are an up-to-date representation of several NCBI genomic, proteomic and metagenomic data, curated and formatted according to the sketching MinHash algorithm developed by Ondov et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. Jun 20;17(1):132. doi: 10.1186/s13059-016-0…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published