The second file of customer db is empty #264

LilyAnderssonLee · 2023-11-07T08:04:19Z

Hi, I am In the process of building a database using RefSeq data that covers bacteria, viral, archaea, fungi, parasite, protoza, plasmid and even contaminants. The input data is quite large, around 1.3TB in size.

However, I've run into an issue where the second file db.2.cf, always turns out empty. Has anyone else had this problem? Here is the code I've been using:

#!/bin/bash
#SBATCH -A xx
#SBATCH -p core
#SBATCH -n 50
#SBATCH -t 10-00:00:00
#SBATCH -J centrifuge_db
#SBATCH --mem=400GB
centrifuge-build -p 50 --bmax 3342177280 --conversion-table seqid2taxid.map
--taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp
input-sequences.fna db

The text was updated successfully, but these errors were encountered:

mourisl · 2023-11-07T19:57:58Z

I think for 1.3TB sequences, you may need about 3TB memory to build the index...

LilyAnderssonLee · 2023-11-09T11:50:19Z

@mourisl Thanks for your response. It's sad that I don't have sufficient memory available. I suppose I'll need to reduce the data size, perhaps by only keeping the representative genome for each species.

LilyAnderssonLee · 2023-11-22T10:28:57Z

@mourisl I am wondering what is the k-mer length used during genomes compression in the centrifuge database h+p+v+c or what is the default k-mer in database construction?

Are you planning to update the Centrifuge databases or create Centrifuge databases based on all RefSeq genomes?

mourisl · 2023-11-22T15:22:09Z

Centrifuge itself does not use k-mers. For the compression part, it use 31-mers, but this k-mer is used to cluster more similar strains from the species, so the information is not directly used in the compression either.

For the recent RefSeq prokaryotic genomes, the size is too huge, and the index size is above 80GB, which is beyond the limit from Zenodo...

LilyAnderssonLee mentioned this issue Nov 17, 2023

Build centrifuge database genomic-medicine-sweden/taxprofiler#36

Open

LilyAnderssonLee closed this as completed Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The second file of customer db is empty #264

The second file of customer db is empty #264

LilyAnderssonLee commented Nov 7, 2023

mourisl commented Nov 7, 2023

LilyAnderssonLee commented Nov 9, 2023

LilyAnderssonLee commented Nov 22, 2023 •

edited

Loading

mourisl commented Nov 22, 2023

The second file of customer db is empty #264

The second file of customer db is empty #264

Comments

LilyAnderssonLee commented Nov 7, 2023

mourisl commented Nov 7, 2023

LilyAnderssonLee commented Nov 9, 2023

LilyAnderssonLee commented Nov 22, 2023 • edited Loading

mourisl commented Nov 22, 2023

LilyAnderssonLee commented Nov 22, 2023 •

edited

Loading