Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond blastx out-of-memory #397

Closed
Tom-Jenkins opened this issue Oct 14, 2020 · 21 comments
Closed

Diamond blastx out-of-memory #397

Tom-Jenkins opened this issue Oct 14, 2020 · 21 comments

Comments

@Tom-Jenkins
Copy link

Hi, I want to run diamond blastx on a nr protein database created using the following commands:

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz
diamond makedb --in nr.gz -d nr

My query is a 1.7G FASTA file and the nr.dnmd database file is 153G. According to the logfile of prior runs, "The host system is detected to have 134 GB of RAM".

However, I keep getting errors (not always the same error), which all seem to be related to memory. I have adjusted the -b and -c parameters but I still get errors related to memory. I have attached the logfile of my latest run and was hoping you could help me solve this issue. Thank you in advance.

diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 16 -b 0.4

Error:

Computing alignments... /var/spool/slurmd/job87627/slurm_script: line 14:   431 Killed                  diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta -a diamond_nr_contigs.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 16 -b 0.4
slurmstepd: error: Detected 1 oom-kill event(s) in step 87627.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

slurm-87627.out.txt

@bbuchfink
Copy link
Owner

How much memory have you allocated to the job in your slurm submit script? It could be that frameshift alignments or range culling lead to increased memory usage. Could you try without these options? How long is your longest query?

@Tom-Jenkins
Copy link
Author

Here is my slurm script:
#!/bin/bash

#SBATCH --export=ALL # export all environment variables to the batch job
#SBATCH -D . # set working directory to .
#SBATCH -p pq # submit to the parallel queue
#SBATCH --time=12:00:00 # maximum walltime for the job
#SBATCH -A Research_Project-T109743 # research project to submit under
#SBATCH --nodes=1 # specify number of nodes
#SBATCH --ntasks-per-node=16 # specify number of processors per node
#SBATCH -p highmem

I have used both the high memory node (32 cores, 3 TB) and the standard node (16 cores, 128 GB) and got the same errors. Do I need to ask for more memory, even on the high memory node?

My longest query is 15.5 Mbp. I have just submitted the script without the -F and --range-culling parameters and it does seem to be running OK so far.
diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs.daa -t /tmp/ --salltitles --top 10 -p 16

@bbuchfink
Copy link
Owner

Yes, I think you probably need to request more memory in your submit script.

@Tom-Jenkins
Copy link
Author

Tom-Jenkins commented Oct 15, 2020

Unfortunately, even with the high memory node and 1000G memory (the maximum I can request) it runs out of memory after 4 1/2 hours of run time. My slurm script is below and I've attached the logfile. Is there any way I can execute diamond blastx with these files using -F and --range-culling without consuming so much memory? I have also tried adjusting the -b parameter to 1 but that doesn't seem to help.

#!/bin/bash

#SBATCH --export=ALL # export all environment variables to the batch job
#SBATCH -D . # set working directory to .
#SBATCH -p pq # submit to the parallel queue
#SBATCH --time=12:00:00 # maximum walltime for the job
#SBATCH -A Research_Project-T109743 # research project to submit under
#SBATCH --nodes=1 # specify number of nodes
#SBATCH --ntasks-per-node=16 # specify number of processors per node
#SBATCH -p highmem
#SBATCH --mem=1000G
#SBATCH --mail-type=END # send email at job completion
#SBATCH --mail-user=t.l.jenkins@exeter.ac.uk # email address

# Commands
diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 16 -c 1 -b 10

slurm-88567.out.txt

@bbuchfink
Copy link
Owner

I'm not sure what causes this high memory use and will have to look into it. If you want, you can send me your query file so I can try to reproduce your run.

@Tom-Jenkins
Copy link
Author

Thank you for looking into this. The file is too big to upload, can I send it to your email via WeTransfer?

@bbuchfink
Copy link
Owner

Sure, my email is buchfink@gmail.com

@bbuchfink
Copy link
Owner

It was the DP matrices in traceback mode that were using up too much memory. This should fix the issue: 199cd79

Using this I was able to run your dataset with about 40 GB of memory use (with the default block size of 2, which is fine).

@Tom-Jenkins
Copy link
Author

Sorry to be a nuisance, but I still seem to have an error after re-installing diamond and re-running diamond blastx.

Computing alignments... /var/spool/slurmd/job95338/slurm_script: line 15: 26030 Bus error (core dumped) diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs2.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 8
Isca HPC: Slurm Job_id=95338 Name=isca-diamond2.sh Ended, Run time 00:22:45, FAILED, ExitCode 135

Slurm script:

#!/bin/bash

#SBATCH --export=ALL # export all environment variables to the batch job
#SBATCH -D . # set working directory to .
#SBATCH -p pq # submit to the parallel queue
#SBATCH --time=24:00:00 # maximum walltime for the job
#SBATCH -A Research_Project-T109743 # research project to submit under
#SBATCH --nodes=1 # specify number of nodes
#SBATCH --ntasks-per-node=8 # specify number of processors per node
#SBATCH -p highmem
#SBATCH --mail-type=END # send email at job completion
#SBATCH --mail-user=t.l.jenkins@exeter.ac.uk # email address

# Commands
diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs2.daa -t /tmp/ --salltitles -F 15 --range-culling --top 10 -p 8

I have attached the logfile.
slurm-95338.out.txt

@bbuchfink
Copy link
Owner

Bus error does not seem like a memory problem any more. How much free space does your /tmp/ folder have?

@Tom-Jenkins
Copy link
Author

I just re-ran the same command but without the /tmp/ and got this error:

Computing alignments... /var/spool/slurmd/job95541/slurm_script: line 15: 831 Killed diamond blastx -d ~/aqualeap/databases/nr -q ../curated.fasta --outfmt 100 -o diamond_nr_contigs2.daa --salltitles -F 15 --range-culling --top 10 -p 8 slurmstepd: error: Detected 1 oom-kill event(s) in step 95541.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

In terms of free space, I have quite a lot:

Filesystem      Size  Used Avail Use% Mounted on
ts0              10T  6.3T  3.8T  63% /gpfs/ts0

@bbuchfink
Copy link
Owner

Not sure since I tested it with the same file and it worked fine. Please double check that you have cloned the latest version of the repo, compiled from source and are running that version of Diamond.

@DanielRivasMD
Copy link

Hi,

I have a similar problem. I have been using Diamond, first version v2.0.9, no v2.0.12, on a group to search for sequences on a collection of assemblies. The sequences I collected myself, and are about 150 and range around 1500 bp. Things work fine for the most part. However, on some assemblies that are bigger, I ran out of memory, even when using 1000G. Initially, I used block-size 6 and index-chunks 1. But reading the above comments I modified it to block-size 2 and index-chunks 4. The documentation mentions that these parameters are pivotal for performance and memory usage. Should I understand from this statement that if I tunned them down on performance the memory usage will reduce?

It is worth mentioning that I am also using frameshift 15 as we discussed on #458.

This is my current setup:

gzip --decompress --stdout ${inDir}/${assemblyT}.fasta.gz | \
  diamond blastx \
    --db ${libraryDir}/${libraryT}.dmnd \
    --query - \
    --frameshift 15 \
    --block-size 2 \
    --index-chunks 4 \
    --out ${outDir}/${species}.tsv

Any suggestion would be highly appreciated.

@bbuchfink
Copy link
Owner

How big are your assemblies and how many threads do you run?

@DanielRivasMD
Copy link

Thanks for your reply. I run 16 CPU threads with 1000G memory, but with less memory (128GB) I could run 28 CPU threads. One of the assemblies uncompressed is 3.6 G

@bbuchfink
Copy link
Owner

How long is the longest contig?

@DanielRivasMD
Copy link

DanielRivasMD commented Oct 25, 2021

For this particular assembly this are the specs:

karyotype: 				2n=18

contigN50: 				107,955
totalContigLength:		3,499,615,818
longestContig:			1,055,336
numberOfContigs:		72,993

scaffoldN50:			524,289,849
totalScaffoldLength:	3,573,327,505
longestScaffold:		747,302,727
numberOfScaffolds:		5,136

@bbuchfink
Copy link
Owner

The longest I tested were bacterial chromosomes, but queries of >700 MB can easily break the current code. I do plan to rework the blastx mode which will probably happen in the next weeks, but I can't offer you an easy solution now. These may be options that work:

Extract ORFs and run the blastp mode on them.
Chop the sequences into overlapping ~100kb windows and run blastx on them.

@DanielRivasMD
Copy link

DanielRivasMD commented Oct 25, 2021

I see. I had thought about the second option. Other alternative that I considered was to run each scaffold independently, but I guess this would not work since the problem seems to be the length, correct?

I will try as you suggest. Thanks a lot for your input, and please let me know when you update blastx.

@bbuchfink
Copy link
Owner

You could try that too but I assume that the length is the problem.

May I also ask why extracting ORFs is not an option for you? Are you looking for alignments that span over stop codons?

@DanielRivasMD
Copy link

I thought so.

I will definitely try extracting ORFs as well. I just had not thought about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants