Compute and compare MinHash signatures for DNA data sets.
Standard ML Python Jupyter Notebook C++ Makefile TeX
Latest commit 9b9f7f4 Jan 21, 2017 @ctb ctb committed on GitHub Merge pull request #100 from dib-lab/feature/asv
[MRG] Benchmarks using ASV
Failed to load latest commit information.
.github PR template & Coc Jun 8, 2016
benchmarks Fix failing benchmarks Jan 20, 2017
data README for data/ Jun 29, 2016
demo update notebook Sep 13, 2016
doc Load json and yaml (#78) Jan 2, 2017
sourmash_lib Finish removing SourmashCommands class Jan 20, 2017
third-party updated gitignore Apr 18, 2016
utils verify with independently computed MinHash signatures Dec 3, 2016
.coveragerc Add lots of tests while updating sbt_gather for --scaled/max_hash (#108) Jan 15, 2017
.gitignore Move commands to separate file (#118) Jan 20, 2017
.travis.yml Cython fixes Jan 6, 2017
CODE_OF_CONDUCT.rst PR template & Coc Jun 8, 2016 contributing Jun 10, 2016
Dockerfile ok, better understanding Jun 12, 2016
LICENSE remove MSU from license Jun 29, 2016 remove 'sourmash' script from dist Jul 12, 2016
Makefile Remove comments from config, add a simple target in the Makefile for … Jan 20, 2017 Load json and yaml (#78) Jan 2, 2017
asv.conf.json Remove comments from config, add a simple target in the Makefile for … Jan 20, 2017
codemeta.json rename paper.json to codemeta.json Jun 7, 2016
index.ipynb update for binder Jun 11, 2016
matplotlibrc tests for fig Jun 3, 2016
paper.bib Fixing bibtex entry (missing commas) Sep 14, 2016 update for v1.0 Sep 13, 2016
pytest.ini Initial benchmarks Jan 20, 2017
requirements.txt Cython, first round Jan 5, 2017 Cython fixes Jan 6, 2017
sourmash added comment about entrypoints Jul 12, 2016
tox.ini Cython fixes Jan 6, 2017


Documentation Build Status codecov DOI

Compute MinHash signatures for DNA sequences.


sourmash compute *.fq.gz
sourmash compare *.sig -o distances
sourmash plot distances

We have demo notebooks on binder that you can interact with:


Sourmash is published on JOSS.

The name is a riff off of Mash, combined with @ctb's love of whiskey. (Sour mash is used in making whiskey.)

Authors: C. Titus Brown (@ctb) and Luiz C. Irber, Jr (@luizirber).

sourmash is a product of the Lab for Data-Intensive Biology at the UC Davis School of Veterinary Medicine.


You can do:

pip install sourmash

sourmash runs under both Python 2.7.x and Python 3.5. The base requirements are screed and ijson, together with a C++ development environment and the CPython development headers and libraries (for the C++ extension).

The comparison code (sourmash compare) uses numpy, and the plotting code uses matplotlib and scipy, but most of the code is usable without these.


Please ask questions and files issues on Github. The developers sometimes hang out on gitter.


Development happens on github at dib-lab/sourmash.

sourmash is the main command-line entry point; run it for help.

sourmash_lib/ contains the library code.

Tests require py.test and can be run with make test.