Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use cases #208

Open
ctb opened this Issue May 9, 2017 · 6 comments

Comments

Projects
None yet
1 participant
@ctb
Copy link
Member

commented May 9, 2017

This issue can serve as a placeholder for use cases for sourmash/MinHash more generally.

Stuff we already have implemented:

  • basic MinHash comparisons etc
  • metagenome taxonomy breakdown
  • streaming sequence classification

Off-label and emerging use cases:

  • examining genomic contamination
  • comparing and validating different binning approaches
  • analysis of unknowns / hashes as stable identifiers
  • 16s etc. clustering

please add more here - we're in danger of forgetting all the great ideas we come up ;)

@ctb

This comment has been minimized.

Copy link
Member Author

commented May 9, 2017

tetramer nucleotide clustering

basic kmer searching (--scaled 1)

@ctb

This comment has been minimized.

Copy link
Member Author

commented May 11, 2017

contamination detection

@ctb

This comment has been minimized.

Copy link
Member Author

commented May 13, 2017

  • speed up genome-scale search and membership analysis of arbitrarily large WGS metagenomes by 1000-1m fold.
  • cluster metagenome WGS data sets by similarity on very large scales
  • classify strain variants (As in the above blog post) very quickly
  • index public and private collections of metagenomes and genomes on the scale of ~100k+ to make them publicly and privately searchable.
  • identify known genomes in metagenomes very quickly
@ctb

This comment has been minimized.

Copy link
Member Author

commented May 18, 2017

  • using our public database, find NCBI accession of genome you're working with
  • using our public database, find (all) strains of genome genome you're working with
  • build a discovery & notification service for new SRA/genbank/IMG/etc genomes
@ctb

This comment has been minimized.

Copy link
Member Author

commented May 18, 2017

via Cameron Thrash, "when we have pure culture genomes and want to see in which datasets we can recruit large numbers of reads for ecological comparison"

@ctb

This comment has been minimized.

Copy link
Member Author

commented May 29, 2017

I think "find NCBI accession of genome you're working with" could actually be expanded quite a bit - this could be a super convenient approach to getting full taxonomic information for something quickly, linking out to public databases, and cross-referencing across what NCBI/SRA/IMG/etc have made available. Actually a pretty exciting solution to a whole host of problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.