Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Download & Documentation #141

Open
gbouras13 opened this issue Aug 29, 2023 · 1 comment
Open

Database Download & Documentation #141

gbouras13 opened this issue Aug 29, 2023 · 1 comment

Comments

@gbouras13
Copy link

gbouras13 commented Aug 29, 2023

Gday @psj1997 @luispedro and other Semibin developers,

Firstly thanks for Semibin(2) - it works amazingly well, so many bins recovered compared to other binning methods :)

I want to share some feedback regarding database download and Semibin's documentation.

The HPC cluster I use at my institution blocks internet access on compute nodes. Therefore, lazily downloading the Semibin2 database did not work when I ran the below command (Semibin v1.5.1, Linux installation via bioconda).

SemiBin2 multi_easy_bin -i {input.catalogue}  -b {input.bams} -o {params.outdir} -s {params.separator} --minfasta-kbs {params.minfasta}

It was difficult for me to figure out that this was in fact the error, because a database isn't mentioned in the readme and only in the FAQs of the docs, and the error message wasn't informative (apologies I have overwritten the log file or I would quote it).

I then tried following the FAQs of the docs to download the updated GTDB database, the following does not work in MMseqs2 v13.45111 (with this known MMSeqs2 error soedinglab/MMseqs2#561)

mmseqs databases GTDB GTDB tmp

Then, after looking at the Semibin codebase I was able to install the database manually:

wget 'https://zenodo.org/record/4751564/files/GTDB_v95.tar.gz?download=1'
mv GTDB_v95.tar.gz?download=1  GTDB_v95.tar.gz
tar -xzvf GTDB_v95.tar.gz

and went from there, specifying -r {params.db} and then semibin worked perfectly.

So perhaps either including a specific --download_database flag or script, or just documenting a manual install method would help future users like me without compute node internet access.

George

@luispedro
Copy link
Member

If you are calling SemiBin2, it should not be downloading the MMSeqs DB anymore. I will check again whether we had not mistakenly kept that in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants