Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occured during blastp run with query '-' #55

Open
rzhan186 opened this issue Jun 8, 2023 · 3 comments
Open

An error occured during blastp run with query '-' #55

rzhan186 opened this issue Jun 8, 2023 · 3 comments

Comments

@rzhan186
Copy link

rzhan186 commented Jun 8, 2023

Dear mdmcleaner developers,

I experienced a blastp error during mdmclean clean, which resulted in a runtime error. Could you have me troubleshoot please?

I was running mdmcleaner in a compute cluster using a virtual python (3.11) environment with the full mdmcleaner database. I've attached the log file here.

Meanwhile, I will try the database used in the pulibcation to see if this error is caused by the database.

Thank you for your help!

mdmcleaner_out.txt

Sincerely,
Rui

@rzhan186
Copy link
Author

rzhan186 commented Jun 9, 2023

Just an update, with the reduced-sized database, I was able to run mdmclean clean successfully, which brought down contamination score from 14 to 7 as shown by checkm2. Therefore, I suspect this might be a database-related issue.

Another question I have right now is that it took about 9.5 hours with 125GB RAM on a compute cluster to decontaminate one MAG with a size of 3.3 million bases. Thus, it might take an exceptionally long time if I were to decontaminate hundreds of MAGs. I am wondering if there is any way to speed up the process. e.g., Will it run faster if I provide multiple MAGs at the same time?

Thank you!

@jvollme
Copy link
Collaborator

jvollme commented Jul 5, 2023

Sorry for the late reply.
Good that you found a workaround for the reference Libra. However the error message mentions processes aborting with the signal "Signals.SIGABRT: 6". I am not sure, but I think this indicates the process being terminated on the side of your server (maybe you ran out of disk space? Or the queuing system automatically killed your process after a certain time?).
Regarding the number of input genomes. Indeed mdmcleaner is not at all meant to be ran separately for each genome*!
The -i option takes multiple arguments so you can supply as many genomes as you want at the same time. E.g. like this:mdmcleaner clean -i inputfolder/*.fasta.gz.
Running it individually means it has to load the reference database again each time, and also re-rrun blasts for reference database ambiguities again and again. Running it once for all inputs means you share these runs and instances.

@rzhan186
Copy link
Author

rzhan186 commented Jul 5, 2023

Hi @jvollme, thanks for your reply! I tried re-downloading the updated database since I couldn't solve the previous database error. While doing this, a new error about the md5sum file missing arose. Thus, I went to the source code of read_gtdb_taxonomy.py and found that the script pulls the database from https://data.ace.uq.edu.au/public/gtdb/data/releases/latest, but currently there is no MD5SUM.txt file in this archive folder, unlike the previous releases. What I did is that I went to https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.1/ and manually downloaded the MD5SUM.txt file from this folder and placed it i the mdmcleaner database folder, and deleted the "_r214" parts in the file while mdmclean makedb is being run. Eventually I managed to finish the database building process successfully. Now mdmclean clean is working perfectly. Just a heads-up for others who might be experiencing the same issue.

However, there is one thing I am not too sure about, when I check the DB_versions.txt file, I got the following

GTDB version = None
RefSeq release = release218
silva_download_dict = 138.1

I am pretty sure that I have gtdb release 214.1, but why would it not show in this file? Could it be possible that gtdb wasn't downloaded successfully but 'mdmclean clean' still managed to run?

I've attached all files in the mdmcleaner database folder here
mdmcleaner_database_files.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants