(temporarily named as "genome-subsampler")
The reference genome database is essential for multiple bioinformatic applications, including taxonomic and functional profiling, reference-based assembly and scaffolding, phylogenomic reconstruction and placement, and many more. There is an increasingly significant conflict between the rapid growth of available genome sequences, the frequent limitation of computing power and the demand for efficient and accurate database searching.
This tool attempts to provide a solution by reducing the volume of the reference genome database while retaining as much biodiversity as permitted by a set of statistical and empirical measurements.
This tool is coded in Python 3.x.