# Sourmash!

[sourmash](https://sourmash.readthedocs.io/en/latest/) is research software from the Lab for Data Intensive Biology (that's my lab!) at UC Davis. It implements MinHash and modulo hash.

Sourmash works on *signature files*, which are just saved collections of hashes.

Let's try it out!

## Compute scaled signatures

In [None]:
!rm -f *.sig
!sourmash compute -k 21,31,51 --scaled=1000 genomes/*.fa --name-from-first -f

This outputs three signature files, each containing three signatures (one calculated at k=21, one at k=31, and one at k=51).

In [None]:
ls *.sig

We can now use these signature files for various comparisons.

## Search multiple signatures with a query

The below command queries all of the signature files in the directory with the `shew_os223` signature and finds the best Jaccard similarity:

In [None]:
!sourmash search -k 31 shew_os223.fa.sig *.sig

The below command uses Jaccard containment instead of Jaccard similarity:

In [None]:
!sourmash search -k 31 shew_os223.fa.sig *.sig --containment

## Performing all-by-all queries

We can also compare all three signatures:

In [None]:
!sourmash compare -k 31 *.sig

...and produce a similarity matrix that we can use for plotting:

In [None]:
!sourmash compare -k 31 *.sig -o genome_compare.mat

In [None]:
!sourmash plot genome_compare.mat

from IPython.display import Image
Image(filename='genome_compare.mat.matrix.png') 

and for the R aficionados, you can output a CSV version of the matrix:

In [None]:
!sourmash compare -k 31 *.sig --csv genome_compare.csv

In [None]:
!cat genome_compare.csv

This is now a file that you can load into R and examine - see [our documentation](https://sourmash.readthedocs.io/en/latest/other-languages.html) on that.

## working with metagenomes

Let's make a fake metagenome:

In [None]:
!cat genomes/*.fa > fake-metagenome.fa
!sourmash compute -k 31 --scaled=1000 fake-metagenome.fa

We can use the `sourmash gather` command to see what's in it:

In [None]:
!sourmash gather fake-metagenome.fa.sig shew*.sig akker*.sig

## Other pointers

[Sourmash: a practical guide](https://sourmash.readthedocs.io/en/latest/using-sourmash-a-guide.html)

[Classifying signatures taxonomically](https://sourmash.readthedocs.io/en/latest/classifying-signatures.html)

[Pre-built search databases](https://sourmash.readthedocs.io/en/latest/databases.html)
