-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
centrifuge-build hanging up #199
Comments
Sorry for the delayed reply, which version of Centrifuge did you use? Thank you. |
Hello Li Song, no problem, thanks for your response. I'm using versión 1.0.4-beta. Could you help me find out the problem? Thank you. |
I just checked the log and realized that I fixed this bug after the release of 1.0.4-beta. Can you try git clone to get the most recent version of Centrifuge? Thank you. |
Thanks Li Song, I'll try with that and let you know how it goes. What was the bug?. Bests, Gastón. |
I also ran into this (or a similar issue) while I was using the provided Makefile to make an |
Hi, I have a similar issue with nt. I'm using version 1.0.4. I modified map file to have something starting with : accession.version taxid After one hour, the process do not write anything else. nr.1.cf and nt.3.cf are not empty but nt.2.cg is empty. I have only warning in output logs. The process uses only one CPU. Moreover, nt indexes available in centrifuge web site are not up to date (They are from 2018). Could you help me, please ? |
Hi all, I have the same error with nt, anyone fix it? |
Hi all, I have similar problem with a custom database. Did anyone figure it out? |
I gave up finally!
…On Mon, May 23, 2022 at 10:45 AM Natalia Savytska ***@***.***> wrote:
Hi all, I have similar problem with a custom database. Did anyone figure
it out?
—
Reply to this email directly, view it on GitHub
<#199 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGTQVVQISC777JRSXRPHSQDVLNATJANCNFSM4RKXA3TQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
For me this occurred error "Warning: taxomony id doesn't exists for NC_0####.1! (repetead several times for different ids)" it was that when I concatenated several seqid2taxid.maps it sporadically missed a newline at a junction between two files which made centrifuge miss all the NCBI taxid entries after that, when running centrifuge-build |
is there any solution, if anyone got? Thank |
Hello Any suggestions. |
Hi Pls suggest. |
hi |
Hi, have there been any updates on this issue? I am encountering the same thing. |
How much memory do you have on your server and which database are you building? Thank you. |
I am trying to build a custom database based on bacteria, viral, fungi and protozoa downloaded from RefSeq. I'm running centrifuge v1.0.4, and have tried with the conda installation and installed from source. The total size of my fasta file is 148GB. On my last attempt to build, I tried with 80GB of memory and 8 cores. I didn't get any error messages about running out of memory, I just got warnings e.g. "Warning: taxonomy id doesn't exists for NCxxx" as above, and the output file refseq.4.cf was empty. I have access to more memory though, so I could try with that. The command I used to build was: |
With 148G sequence, I think you may need about 600GB memory to build the index. You can increase --dcv and --bmax values to reduce the memory, but may taking longer time to build. |
OK thank you, I will try that! |
Hello everyone. I've recently started using Centrifuge, and I've been able to create a viral index and use it with my metagenomic data. However, when I'm trying to build a bacteria index (bac), the process hangs up (at least that's the only explanation I've encountered so far). I'm using the following script:
centrifuge-build -p 8 --conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp inputs/seq_bac.fna indices/bac
The files bac.1.cf, bac.2.cf, and bac.3.cf, are created within a few minutes after the job begins, but file bac.2.cf is 0 kb size. The output shows:
Settings:
Output files: "indices/bac..cf"
Line rate: 7 (line is 128 bytes)
Lines per side: 1 (side is 128 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
inputs/seq_bac.fna
Reading reference sizes
Warning: Encountered reference sequence with only gaps
Time reading reference sizes: 00:07:04
Calculating joined length
Writing header
Reserving space for joined string
Could not allocate space for a joined string of 67127059294 elements.
Switching to a packed string representation.
Reading reference sizes
Warning: Encountered reference sequence with only gaps
Time reading reference sizes: 00:07:04
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:07:05
Warning: taxomony id doesn't exists for NC_017270.1! (repetead several times for different ids)
Warning: Taxonomy ID 90270 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! (repetead several times for different ids)
Even after leaving it running for a few days, bac.*.cf files do not show modifications, and output is freezed (I believe hanged up).
I've tried removing the erroneus IDs but the process still hangs up.
Could you help me understand what's going on in order to solve this?
Thank you so much!
Best regards
Prof. Dr. Gastón Viarengo
Institute of Molecular and Cellular Biology of Rosario (IBR-CONICET)
Human Virology Lab
The text was updated successfully, but these errors were encountered: