Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using fastmultigather to do contig-level gather and taxonomy assignment - a brief tutorial #3095

Open
ctb opened this issue Mar 22, 2024 · 2 comments
Labels
doc documentation content or issues faq things to add to an FAQ or docs

Comments

@ctb
Copy link
Contributor

ctb commented Mar 22, 2024

There's quite a bit of interest #2816 #3070 #3089 in using doing contig-level/long-read gather and (maybe) taxonomy assignment for contigs/long reads. Here's a short example that uses fastmultigather to do this.

A few notes -

fastmultigather quickstart using small data sets

hackmd for editing: https://hackmd.io/ztM-7ZJoSYahMMPde7Q5vw?view

# make working dir
mkdir podar-ref-singleton
cd podar-ref-singleton

# download example data
curl -L https://osf.io/vbhy5/download -o podar-ref.tar.gz

# unpack
tar xzf podar-ref.tar.gz

# sketch twice - once with all contigs using --singleton, once combining each file
sourmash sketch dna --singleton *.fa -o podar-ref-singleton.zip
sourmash sketch dna --name-from-first *.fa -o podar-ref-genomes.zip

# index database so that fastmultigather can produce all gather columns
# this will take a while if you do it for large databases!
sourmash scripts index podar-ref-genomes.zip -o podar-ref.rocksdb

# run fastmultigather
sourmash scripts fastmultigather ../podar-ref-singleton.zip podar-ref.rocksdb -o gather.csv

# all your gather results will be in gather.csv

# grab lineage file
curl -L https://osf.io/4yhjw/download -o podar-ref.tax.csv

sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out

# all results will be in out.human.txt

Related issues:

@yuzie0314
Copy link

Hi @ctb,
I have spent some time reviewing your tutorial and tested several times, but it failed in the final step, namely the sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out step.

Good news is that fastmultigather does the magic to speed up the gather step which is fantastic ! And I am sure that the results are what we want, we saw the query_name is the contig names within a genome (query) and the match_name is the reference genome name.

The error message as follow:

singularity exec -B pwd -B /fsx /fsx/singularity/branchwater.0.8.5.sif bash test.sh 

== This is sourmash version 4.8.5. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

Exiting.
ERROR: 'gather.csv' is missing columns needed for taxonomic summarization. Please run gather with sourmash >= 4.4.

PS the content of test.sh is:
sourmash tax genome -g gather.csv -t podar-ref.tax.csv -F human -o out

the expected output is out.human.txt containing taxonomic information.

螢幕擷取畫面 2024-05-13 112309

Thank you in advance,
Yuzie

@ctb
Copy link
Contributor Author

ctb commented May 14, 2024

ok, took me a second 😆 😭

and apologies for the complicated answer. This should be resolved in the next few weeks... but for now... it's a bit of a mess.

Question: are you using a rocksdb index? The current release of the plugin, v0.9.3, only supports full gather output when using fastmultigather against a rocksdb index.

This will be updated in the next release, since sourmash-bio/sourmash_plugin_branchwater#298 was merged!

However, the bad news is that testing has since revealed that fastmultigather against a rocksdb has a bug in it where it returns incomplete results; see sourmash-bio/sourmash_plugin_branchwater#322. (The good news, such as it is, is that the results are accurate when using fastgather/fastmultigather NOT against a rocksdb index...)

SO, for now, the solution is: use fastgather or fastmultigather WITHOUT a rocksdb index, and then run sourmash gather using a picklist, per https://github.com/sourmash-bio/sourmash_plugin_branchwater/blob/main/doc/README.md#using-fastgather-to-create-a-picklist-for-sourmash-gather.

I'll update you here when we have fixed the problems and released a new version. Apologies, things got tricky with all our different efforts to speed things up!

Related issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc documentation content or issues faq things to add to an FAQ or docs
Projects
None yet
Development

No branches or pull requests

2 participants