New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diamond blastx out-of-memory #397
Comments
How much memory have you allocated to the job in your slurm submit script? It could be that frameshift alignments or range culling lead to increased memory usage. Could you try without these options? How long is your longest query? |
Here is my slurm script: #SBATCH --export=ALL # export all environment variables to the batch job I have used both the high memory node (32 cores, 3 TB) and the standard node (16 cores, 128 GB) and got the same errors. Do I need to ask for more memory, even on the high memory node? My longest query is 15.5 Mbp. I have just submitted the script without the -F and --range-culling parameters and it does seem to be running OK so far. |
Yes, I think you probably need to request more memory in your submit script. |
Unfortunately, even with the high memory node and 1000G memory (the maximum I can request) it runs out of memory after 4 1/2 hours of run time. My slurm script is below and I've attached the logfile. Is there any way I can execute diamond blastx with these files using -F and --range-culling without consuming so much memory? I have also tried adjusting the -b parameter to 1 but that doesn't seem to help.
|
I'm not sure what causes this high memory use and will have to look into it. If you want, you can send me your query file so I can try to reproduce your run. |
Thank you for looking into this. The file is too big to upload, can I send it to your email via WeTransfer? |
Sure, my email is buchfink@gmail.com |
It was the DP matrices in traceback mode that were using up too much memory. This should fix the issue: 199cd79 Using this I was able to run your dataset with about 40 GB of memory use (with the default block size of 2, which is fine). |
Sorry to be a nuisance, but I still seem to have an error after re-installing diamond and re-running diamond blastx.
Slurm script:
I have attached the logfile. |
Bus error does not seem like a memory problem any more. How much free space does your /tmp/ folder have? |
I just re-ran the same command but without the /tmp/ and got this error:
In terms of free space, I have quite a lot:
|
Not sure since I tested it with the same file and it worked fine. Please double check that you have cloned the latest version of the repo, compiled from source and are running that version of Diamond. |
Hi, I have a similar problem. I have been using Diamond, first version v2.0.9, no v2.0.12, on a group to search for sequences on a collection of assemblies. The sequences I collected myself, and are about 150 and range around 1500 bp. Things work fine for the most part. However, on some assemblies that are bigger, I ran out of memory, even when using 1000G. Initially, I used It is worth mentioning that I am also using This is my current setup:
Any suggestion would be highly appreciated. |
How big are your assemblies and how many threads do you run? |
Thanks for your reply. I run 16 CPU threads with 1000G memory, but with less memory (128GB) I could run 28 CPU threads. One of the assemblies uncompressed is 3.6 G |
How long is the longest contig? |
For this particular assembly this are the specs:
|
The longest I tested were bacterial chromosomes, but queries of >700 MB can easily break the current code. I do plan to rework the blastx mode which will probably happen in the next weeks, but I can't offer you an easy solution now. These may be options that work: Extract ORFs and run the blastp mode on them. |
I see. I had thought about the second option. Other alternative that I considered was to run each scaffold independently, but I guess this would not work since the problem seems to be the length, correct? I will try as you suggest. Thanks a lot for your input, and please let me know when you update blastx. |
You could try that too but I assume that the length is the problem. May I also ask why extracting ORFs is not an option for you? Are you looking for alignments that span over stop codons? |
I thought so. I will definitely try extracting ORFs as well. I just had not thought about it. |
Hi, I want to run diamond blastx on a nr protein database created using the following commands:
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
diamond makedb --in nr.gz -d nr
My query is a 1.7G FASTA file and the nr.dnmd database file is 153G. According to the logfile of prior runs, "The host system is detected to have 134 GB of RAM".
However, I keep getting errors (not always the same error), which all seem to be related to memory. I have adjusted the -b and -c parameters but I still get errors related to memory. I have attached the logfile of my latest run and was hoping you could help me solve this issue. Thank you in advance.
diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 16 -b 0.4
Error:
slurm-87627.out.txt
The text was updated successfully, but these errors were encountered: