Diamond Clustering randomly crashing. #747

Thernn88 · 2023-10-20T00:58:39Z

I'm using Diamond clustering as part of an alignment strategy. This means I can run clustering over millions of files across a dataset. I run 32 instances of diamond in parallel. The CPU has 64c/128p so I'm not overthreading.

Randomly, diamond is abruptly terminating mid-run. The log files show it simply stops. It's never the same file. I have to trigger it by running the script in a bash loop to catch it. Error rate is estimated at 1/100000 iterations of diamond.

diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Opening the input file... [0s]
Input database: /run/shm/uce-12426_gow28_ui/tmp_in.fa (2 sequences, 65 letters)
Temporary directory: /run/shm/uce-12426_gow28_ui
#Target sequences to report alignments for: unlimited
Database: /run/shm/uce-12426_gow28_ui/tmp_in.fa (type: FASTA file, sequences: 2, letters: 65)
Block size = 3200000000
Opening the input file... [0s]
Opening the output file... [0s]
Seeking in database... [0s]
Loading query sequences... [0s]
Length sorting queries... [0s]
Algorithm: Double-indexed
Building query histograms... [0s]
Seeking in database... [0s]
Seeking in database... [0s]
Initializing temporary storage... [0s]
Building reference histograms... [0s]
Allocating buffers... [0s]
Processing query block 1, reference block 1/1, shape 1/1.
Building reference seed array... [0s]
Building query seed array... [0s]
Computing hash join... [0s]
Masking low complexity seeds... [0s]
Searching alignments... [0s]
Deallocating memory... [0s]
Deallocating buffers... [0s]
Clearing query masking... [0s]
Computing alignments... Loading trace points... [0s]
Sorting trace points... [0s]
Computing alignments...

Possibly, it's some sort of underflow error. It might be crashing when there are too few sequences in a file to cluster. I will try upping the requirement for clustering to occur.

uce-12426.aa.zip

Edit: I implemented a lower limit of 10 sequences to pass through clustering and that appears to have stopped the crashes. I will test how low I can go. (Limbo time!)

Edit2: Here is the command I am running.

diamond cluster -d {tmp_in.name} -o {tmp_result.name} --approx-id 85 --member-cover 65 --threads 1 --quiet

bbuchfink · 2023-10-23T12:49:55Z

I'll try to reproduce it but it may not be that easy.

Thernn88 mentioned this issue Oct 20, 2023

Running multiple diamond searches in parallel on the same machine #732

Open

beazerj mentioned this issue Mar 29, 2024

When running diamond blastp in multiprocessing mode, some processes hang or segfault non deterministically. #801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diamond Clustering randomly crashing. #747

Diamond Clustering randomly crashing. #747

Thernn88 commented Oct 20, 2023 •

edited

Loading

bbuchfink commented Oct 23, 2023

Diamond Clustering randomly crashing. #747

Diamond Clustering randomly crashing. #747

Comments

Thernn88 commented Oct 20, 2023 • edited Loading

bbuchfink commented Oct 23, 2023

Thernn88 commented Oct 20, 2023 •

edited

Loading