-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The second file of customer db is empty #264
Comments
I think for 1.3TB sequences, you may need about 3TB memory to build the index... |
@mourisl Thanks for your response. It's sad that I don't have sufficient memory available. I suppose I'll need to reduce the data size, perhaps by only keeping the representative genome for each species. |
Centrifuge itself does not use k-mers. For the compression part, it use 31-mers, but this k-mer is used to cluster more similar strains from the species, so the information is not directly used in the compression either. For the recent RefSeq prokaryotic genomes, the size is too huge, and the index size is above 80GB, which is beyond the limit from Zenodo... |
Hi, I am In the process of building a database using RefSeq data that covers bacteria, viral, archaea, fungi, parasite, protoza, plasmid and even contaminants. The input data is quite large, around 1.3TB in size.
However, I've run into an issue where the second file
db.2.cf
, always turns out empty. Has anyone else had this problem? Here is the code I've been using:The text was updated successfully, but these errors were encountered: