Skip to content

New sketchlib backend

Choose a tag to compare

@johnlees johnlees released this 10 Feb 15:08
· 2172 commits to master since this release

This is a major new release of PopPUNK, which uses a new 'backend' to do sketching and distance calculation, pp-sketchlib

This changes the input format and the API somewhat, will be incompatible with previous database versions, and generates slightly different distance results. If you need backwards compatibility the previous version can still be run by specifying --use-mash.

New features:

  • Use pp-sketchlib as the backend. This is ~2x faster for sketching and 50-100x faster for distance calculations. Databases are ~1/4 of the size.
  • Input data is now formatted as a tab separated file with name followed by any associated sequence. Sample names no longer have to be filenames. (Closes #43, #46)
  • Read data can now be handled, including a filter to remove k-mers containing sequencing errors
  • Faster database edits with prune_db and reference_pick
  • Ability to use the previous sketching method and databases by specifying --use-mash

Bug fixes:

  • Better error handling when creating visualisations, so output files are still produced if this fails