New sketchlib backend
This is a major new release of PopPUNK, which uses a new 'backend' to do sketching and distance calculation, pp-sketchlib
This changes the input format and the API somewhat, will be incompatible with previous database versions, and generates slightly different distance results. If you need backwards compatibility the previous version can still be run by specifying --use-mash.
New features:
- Use pp-sketchlib as the backend. This is ~2x faster for sketching and 50-100x faster for distance calculations. Databases are ~1/4 of the size.
- Input data is now formatted as a tab separated file with name followed by any associated sequence. Sample names no longer have to be filenames. (Closes #43, #46)
- Read data can now be handled, including a filter to remove k-mers containing sequencing errors
- Faster database edits with
prune_dbandreference_pick - Ability to use the previous sketching method and databases by specifying
--use-mash
Bug fixes:
- Better error handling when creating visualisations, so output files are still produced if this fails