Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Implement lightweight SBT combining/adding for large SBTs #229
In response to @meren,
We actually can do this in a few different ways —
the heaviest weight way right now is to combine or update the database, which is not that time/resource intensive but is still inconvenient. (The database can be updated mostly incrementally; it’s a Sequence Bloom Tree underneath). We have a command line way to do this with ‘sourmash sbt_combine’.
the medium weight way (mostly just frustrating) is to have sbt_gather output unknown bits of the signature. Then you could do iterative search (run sbt_gather on database A, take what remains, run
the lightest weight way to do this is not yet supported but is an hour of hacking away - let the sbt_gather and sbt_search commands take multiple SBTs. The SBT search is very lightweight in terms of memory and resources (searching all of gen bank takes seconds and < 500 MB of RAM) and so simply doing 2x or 3x of them on multiple databases and then massaging the results is not difficult. But I am trying to be a bit careful about complexifying the command line so am hesitant to blindly add it. Easy to do once we need it, tho.