Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: use manysketch for sketching #15

Merged
merged 3 commits into from
Jan 2, 2024
Merged

MRG: use manysketch for sketching #15

merged 3 commits into from
Jan 2, 2024

Conversation

ctb
Copy link
Member

@ctb ctb commented Jan 2, 2024

This PR changes the sketching code to use manysketch instead of sourmash sketch for metagenomes - which will hopefully be (much) faster for annoyingly large metagenomes.

While simple in concept, this necessitates a lot of extra machinery 😅 :

  • individual data files need to be sketched first
  • then, these data files are combined

which OK sounds simple but involves quite a few extra steps in practice!

We also introduce a diagnostic computation that shows datafile membership in the final sketches, as a confirmation.

Fixes #7

@ctb ctb changed the title WIP: use manysketch for sketching MRG: use manysketch for sketching Jan 2, 2024
@ctb ctb merged commit b42b49a into main Jan 2, 2024
@ctb ctb deleted the use_manysketch branch January 2, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sketch data files individually first, then combine sketches?
1 participant