Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diamond Clustering randomly crashing. #747

Open
Thernn88 opened this issue Oct 20, 2023 · 1 comment
Open

Diamond Clustering randomly crashing. #747

Thernn88 opened this issue Oct 20, 2023 · 1 comment

Comments

@Thernn88
Copy link

Thernn88 commented Oct 20, 2023

I'm using Diamond clustering as part of an alignment strategy. This means I can run clustering over millions of files across a dataset. I run 32 instances of diamond in parallel. The CPU has 64c/128p so I'm not overthreading.

Randomly, diamond is abruptly terminating mid-run. The log files show it simply stops. It's never the same file. I have to trigger it by running the script in a bash loop to catch it. Error rate is estimated at 1/100000 iterations of diamond.

diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Opening the input file... [0s]
Input database: /run/shm/uce-12426_gow28_ui/tmp_in.fa (2 sequences, 65 letters)
Temporary directory: /run/shm/uce-12426_gow28_ui
#Target sequences to report alignments for: unlimited
Database: /run/shm/uce-12426_gow28_ui/tmp_in.fa (type: FASTA file, sequences: 2, letters: 65)
Block size = 3200000000
Opening the input file... [0s]
Opening the output file... [0s]
Seeking in database... [0s]
Loading query sequences... [0s]
Length sorting queries... [0s]
Algorithm: Double-indexed
Building query histograms... [0s]
Seeking in database... [0s]
Seeking in database... [0s]
Initializing temporary storage... [0s]
Building reference histograms... [0s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array... [0s]
Building query seed array... [0s]
Computing hash join... [0s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Deallocating buffers... [0s]
Clearing query masking... [0s]
Computing alignments... Loading trace points... [0s]
Sorting trace points... [0s]
Computing alignments...

Possibly, it's some sort of underflow error. It might be crashing when there are too few sequences in a file to cluster. I will try upping the requirement for clustering to occur.

uce-12426.aa.zip

Edit: I implemented a lower limit of 10 sequences to pass through clustering and that appears to have stopped the crashes. I will test how low I can go. (Limbo time!)

Edit2: Here is the command I am running.

diamond cluster -d {tmp_in.name} -o {tmp_result.name} --approx-id 85 --member-cover 65 --threads 1 --quiet

@bbuchfink
Copy link
Owner

I'll try to reproduce it but it may not be that easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants