database search order (diamond blast) #210

gbdias · 2024-03-18T15:56:24Z

Hi, thanks for this amazing tool!

I was trying to understand the execution order of database searches. If I got it correctly it is

diamond on reference proteomes from UniProt, then
blastn on nt, only for the contigs without any hits from the previous step.

However, in your 2017 publication it seems the order is reversed, with blastn run on all sequences and diamond as a second pass for those without any hits.

If my understanding is correct could you explain why running search on reference proteomes first would be advantageous?

The text was updated successfully, but these errors were encountered:

rjchallis · 2024-05-30T07:35:44Z

Hi, sorry I missed this. Yes, the search order has changed. blastn searches against nt are for very short contigs, but inefficient for longer sequences. As sequencing/assembly has improved, most assembled sequences have enough information to get good hits from diamond blast searches against reference proteomes, only using blastn for the sequences without diamond blast hits speeds up the process quite considerably

gbdias · 2024-05-30T12:34:42Z

Hi @rjchallis thanks for the info!

We had a tricky case from a phyla (Nemertea) where, at the time we ran blobtools, there were very few genomes and no reference proteomes available. This resulted in our contigs getting classified as a bunch of equally distant phyla (from Arthropoda to Echinodermata, Chordata, Mollusca, etc).

Today there's still only a handful of Nemertean genomes, but there is one single reference proteome contributed by the NCBI automatic pipeline, so maybe I should try again.

In such cases I guess it could be good to get blastn results on the whole genome first, since the available unannotated Nemertean genomes could be sufficient to correctly classify contigs in the right phyla. But I understand the rationale for the pipeline change. 👍

gbdias changed the title ~~diamond and blast order~~ database search order (diamond blast) Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database search order (diamond blast) #210

database search order (diamond blast) #210

gbdias commented Mar 18, 2024

rjchallis commented May 30, 2024

gbdias commented May 30, 2024

database search order (diamond blast) #210

database search order (diamond blast) #210

Comments

gbdias commented Mar 18, 2024

rjchallis commented May 30, 2024

gbdias commented May 30, 2024