centrifuge-build hanging up #199

GastonViarengo · 2020-09-13T15:25:32Z

Hello everyone. I've recently started using Centrifuge, and I've been able to create a viral index and use it with my metagenomic data. However, when I'm trying to build a bacteria index (bac), the process hangs up (at least that's the only explanation I've encountered so far). I'm using the following script:

centrifuge-build -p 8 --conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp inputs/seq_bac.fna indices/bac

The files bac.1.cf, bac.2.cf, and bac.3.cf, are created within a few minutes after the job begins, but file bac.2.cf is 0 kb size. The output shows:

Settings:
Output files: "indices/bac..cf"
Line rate: 7 (line is 128 bytes)
Lines per side: 1 (side is 128 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
inputs/seq_bac.fna
Reading reference sizes
Warning: Encountered reference sequence with only gaps
Time reading reference sizes: 00:07:04
Calculating joined length
Writing header
Reserving space for joined string
Could not allocate space for a joined string of 67127059294 elements.
Switching to a packed string representation.
Reading reference sizes
Warning: Encountered reference sequence with only gaps
Time reading reference sizes: 00:07:04
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:07:05
Warning: taxomony id doesn't exists for NC_017270.1! (repetead several times for different ids)
Warning: Taxonomy ID 90270 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! (repetead several times for different ids)

Even after leaving it running for a few days, bac.*.cf files do not show modifications, and output is freezed (I believe hanged up).

I've tried removing the erroneus IDs but the process still hangs up.

Could you help me understand what's going on in order to solve this?

Thank you so much!

Best regards

Prof. Dr. Gastón Viarengo
Institute of Molecular and Cellular Biology of Rosario (IBR-CONICET)
Human Virology Lab

mourisl · 2020-10-22T18:30:56Z

Sorry for the delayed reply, which version of Centrifuge did you use? Thank you.

GastonViarengo · 2020-10-23T12:05:10Z

Sorry for the delayed reply, which version of Centrifuge did you use? Thank you.

Hello Li Song, no problem, thanks for your response. I'm using versión 1.0.4-beta. Could you help me find out the problem? Thank you.

mourisl · 2020-10-23T19:46:59Z

I just checked the log and realized that I fixed this bug after the release of 1.0.4-beta. Can you try git clone to get the most recent version of Centrifuge? Thank you.

GastonViarengo · 2020-10-28T14:39:09Z

Thanks Li Song, I'll try with that and let you know how it goes. What was the bug?. Bests, Gastón.

fanninpm · 2021-05-03T13:37:33Z

I also ran into this (or a similar issue) while I was using the provided Makefile to make an nt database. Compiling 65c42fc from source did not change anything.

choede · 2021-12-14T08:38:55Z

Hi, I have a similar issue with nt. I'm using version 1.0.4. I modified map file to have something starting with : accession.version taxid
A00001.1 10641
A00002.1 9913
A00003.1 9913
A00004.1 32630
A00005.1 32630
and launched
centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.2map
--taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp
nt.fa nt

After one hour, the process do not write anything else. nr.1.cf and nt.3.cf are not empty but nt.2.cg is empty. I have only warning in output logs. The process uses only one CPU. Moreover, nt indexes available in centrifuge web site are not up to date (They are from 2018). Could you help me, please ?
Thanks a lot in advance

Jolvii85 · 2022-01-01T05:26:35Z

Hi all, I have the same error with nt, anyone fix it?

savytskanatalia · 2022-05-23T08:45:27Z

Hi all, I have similar problem with a custom database. Did anyone figure it out?

Jolvii85 · 2022-05-24T13:36:39Z

I gave up finally!

…

On Mon, May 23, 2022 at 10:45 AM Natalia Savytska ***@***.***> wrote: Hi all, I have similar problem with a custom database. Did anyone figure it out? — Reply to this email directly, view it on GitHub <#199 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGTQVVQISC777JRSXRPHSQDVLNATJANCNFSM4RKXA3TQ> . You are receiving this because you commented.Message ID: ***@***.***>

wittler-github · 2022-10-20T12:26:14Z

For me this occurred error "Warning: taxomony id doesn't exists for NC_0####.1! (repetead several times for different ids)" it was that when I concatenated several seqid2taxid.maps it sporadically missed a newline at a junction between two files which made centrifuge miss all the NCBI taxid entries after that, when running centrifuge-build

ramnageena11 · 2022-10-25T16:14:05Z

is there any solution, if anyone got?
I am in this situation from last 20 days.

Thank
Ram

ramnageena11 · 2022-10-26T17:17:21Z

Hello Any suggestions.

ramnageena11 · 2022-10-31T19:55:02Z

Hi
It seems I need to change the strategy to analyze my data. Any suggestion other than Centrifuge? I am using Long reads data from ONT, does "Kraken2" will work for Taxonomy analysis?

Pls suggest.
Thanks
RNS

ramnageena11 · 2022-11-02T20:06:44Z

hi

sarah-buddle · 2023-09-12T12:04:42Z

Hi, have there been any updates on this issue? I am encountering the same thing.

mourisl · 2023-09-12T14:15:57Z

How much memory do you have on your server and which database are you building? Thank you.

sarah-buddle · 2023-09-12T14:32:35Z

I am trying to build a custom database based on bacteria, viral, fungi and protozoa downloaded from RefSeq. I'm running centrifuge v1.0.4, and have tried with the conda installation and installed from source. The total size of my fasta file is 148GB. On my last attempt to build, I tried with 80GB of memory and 8 cores. I didn't get any error messages about running out of memory, I just got warnings e.g. "Warning: taxonomy id doesn't exists for NCxxx" as above, and the output file refseq.4.cf was empty. I have access to more memory though, so I could try with that. The command I used to build was:
centrifuge-build --conversion-table ${db}/seqid2taxid.map --taxonomy-tree ${software}/taxdump/new_taxdump_2023-08-01/nodes.dmp --name-table ${software}/taxdump/new_taxdump_2023-08-01/names.dmp ${db}/refseq_all_genomic.fasta refseq -p 8

mourisl · 2023-09-12T14:35:34Z

With 148G sequence, I think you may need about 600GB memory to build the index. You can increase --dcv and --bmax values to reduce the memory, but may taking longer time to build.

sarah-buddle · 2023-09-12T14:36:01Z

OK thank you, I will try that!

This was referenced Sep 16, 2020

Could not locate a Centrifuge index corresponding to basename "..." Error: Encountered internal Centrifuge exception (#1) #144

Open

make nt database index #50

Open

fanninpm mentioned this issue Oct 24, 2022

Database download for Centrifuge #242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

centrifuge-build hanging up #199

centrifuge-build hanging up #199

GastonViarengo commented Sep 13, 2020

mourisl commented Oct 22, 2020

GastonViarengo commented Oct 23, 2020

mourisl commented Oct 23, 2020

GastonViarengo commented Oct 28, 2020

fanninpm commented May 3, 2021

choede commented Dec 14, 2021

Jolvii85 commented Jan 1, 2022

savytskanatalia commented May 23, 2022

Jolvii85 commented May 24, 2022 via email

wittler-github commented Oct 20, 2022 •

edited

Loading

ramnageena11 commented Oct 25, 2022

ramnageena11 commented Oct 26, 2022

ramnageena11 commented Oct 31, 2022

ramnageena11 commented Nov 2, 2022

sarah-buddle commented Sep 12, 2023

mourisl commented Sep 12, 2023

sarah-buddle commented Sep 12, 2023

mourisl commented Sep 12, 2023

sarah-buddle commented Sep 12, 2023

centrifuge-build hanging up #199

centrifuge-build hanging up #199

Comments

GastonViarengo commented Sep 13, 2020

mourisl commented Oct 22, 2020

GastonViarengo commented Oct 23, 2020

mourisl commented Oct 23, 2020

GastonViarengo commented Oct 28, 2020

fanninpm commented May 3, 2021

choede commented Dec 14, 2021

Jolvii85 commented Jan 1, 2022

savytskanatalia commented May 23, 2022

Jolvii85 commented May 24, 2022 via email

wittler-github commented Oct 20, 2022 • edited Loading

ramnageena11 commented Oct 25, 2022

ramnageena11 commented Oct 26, 2022

ramnageena11 commented Oct 31, 2022

ramnageena11 commented Nov 2, 2022

sarah-buddle commented Sep 12, 2023

mourisl commented Sep 12, 2023

sarah-buddle commented Sep 12, 2023

mourisl commented Sep 12, 2023

sarah-buddle commented Sep 12, 2023

wittler-github commented Oct 20, 2022 •

edited

Loading