You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's quite a bit of interest #2816#3070#3089 in using doing contig-level/long-read gather and (maybe) taxonomy assignment for contigs/long reads. Here's a short example that uses fastmultigather to do this.
fastmultigather currently needs to use a rocksdb database built with sourmash scripts index in order to generate complete gather output. Indexing will take some time for large databases but it will be worth it ;). ref rocksdb docs
fastmultigather is multithreaded, and, with a rocksdb index, is also very low memory. Enjoy!
# make working dir
mkdir podar-ref-singleton
cd podar-ref-singleton
# download example data
curl -L https://osf.io/vbhy5/download -o podar-ref.tar.gz
# unpack
tar xzf podar-ref.tar.gz
# sketch twice - once with all contigs using --singleton, once combining each file
sourmash sketch dna --singleton *.fa -o podar-ref-singleton.zip
sourmash sketch dna --name-from-first *.fa -o podar-ref-genomes.zip
# index database so that fastmultigather can produce all gather columns
# this will take a while if you do it for large databases!
sourmash scripts index podar-ref-genomes.zip -o podar-ref.rocksdb
# run fastmultigather
sourmash scripts fastmultigather ../podar-ref-singleton.zip podar-ref.rocksdb -o gather.csv
# all your gather results will be in gather.csv
# grab lineage file
curl -L https://osf.io/4yhjw/download -o podar-ref.tax.csv
sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out
# all results will be in out.human.txt
Hi @ctb,
I have spent some time reviewing your tutorial and tested several times, but it failed in the final step, namely the sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out step.
Good news is that fastmultigather does the magic to speed up the gather step which is fantastic ! And I am sure that the results are what we want, we saw the query_name is the contig names within a genome (query) and the match_name is the reference genome name.
The error message as follow:
singularity exec -B pwd -B /fsx /fsx/singularity/branchwater.0.8.5.sif bash test.sh
== This is sourmash version 4.8.5. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==
Exiting.
ERROR: 'gather.csv' is missing columns needed for taxonomic summarization. Please run gather with sourmash >= 4.4.
PS the content of test.sh is: sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out
the expected output is out.human.txt containing taxonomic information.
and apologies for the complicated answer. This should be resolved in the next few weeks... but for now... it's a bit of a mess.
Question: are you using a rocksdb index? The current release of the plugin, v0.9.3, only supports full gather output when using fastmultigather against a rocksdb index.
However, the bad news is that testing has since revealed that fastmultigather against a rocksdb has a bug in it where it returns incomplete results; see sourmash-bio/sourmash_plugin_branchwater#322. (The good news, such as it is, is that the results are accurate when using fastgather/fastmultigather NOT against a rocksdb index...)
I'll update you here when we have fixed the problems and released a new version. Apologies, things got tricky with all our different efforts to speed things up!
There's quite a bit of interest #2816 #3070 #3089 in using doing contig-level/long-read gather and (maybe) taxonomy assignment for contigs/long reads. Here's a short example that uses
fastmultigather
to do this.A few notes -
fastmultigather
is part of the branchwater plugin that can be installed with conda. See docs for fastmultigather specifically.fastmultigather
currently needs to use arocksdb
database built withsourmash scripts index
in order to generate complete gather output. Indexing will take some time for large databases but it will be worth it ;). ref rocksdb docsfastmultigather
is multithreaded, and, with a rocksdb index, is also very low memory. Enjoy!fastmultigather
quickstart using small data setshackmd for editing: https://hackmd.io/ztM-7ZJoSYahMMPde7Q5vw?view
Related issues:
multigather
documentation to be clearer, and to recommendfastmultigather
#3069sourmash multigather
for 5.0 #1614The text was updated successfully, but these errors were encountered: