Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: taxomony id doesn't exists for NZ_AJTB01000092.1! #19

Closed
sridhar0605 opened this issue Aug 8, 2016 · 8 comments
Closed

Warning: taxomony id doesn't exists for NZ_AJTB01000092.1! #19

sridhar0605 opened this issue Aug 8, 2016 · 8 comments

Comments

@sridhar0605
Copy link

has any come across this error so far? both my input_sequence file and seqid2taxa.map files has this id, centrifuge-build is still spitting this error out..

@fbreitwieser
Copy link
Collaborator

Can you show the relevant parts in your input files? Does the taxa exist in the taxonomy tree, too?

@sridhar0605
Copy link
Author

I have downloaded the latest taxonomy and split it in to names and dump using
tar -zxvf taxdump.tar.gz nodes.dmp and names.dmp
before that let me brief you about me database
I have all the genomic.fna.gz files for bacteria and virus from ftp://ftp.ncbi.nlm.nih.gov/refseq/release/bacteria/
I tried using kraken to build a database (it failed badly due to memory issues even on amazon ec2 instance with 2TB of ram) hence I wanted to use centrifuge which suits perfectly for my analysis.
So far I have concatenated all the reads in a single fna file and used seqid2taxa.map file from kraken as initial inputs to centrifuge as centrifuge does not download all the bacterial files. I modified the fasta file as per centrifuge requirement just the sequence id and description. I however end up geting this as an error

Warning: taxomony id doesn't exists for NZ_AJTB01000101.1!

and then this too..
Warning: Taxonomy ID 1527292 is not in the provided taxonomy tree (taxonomy/nodes.dmp)!
I then used the same nodes.dmp and names.dmp file from kraken output, still no success.

@fbreitwieser
Copy link
Collaborator

This record has been removed from the NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/nuccore/NZ_AJTB01000092.1). Usually we detect these cases by missing entries in the taxonomy dump - which I think is the case here. Note that the assembly_summary and taxonomy are not always in sync.

@sridhar0605
Copy link
Author

That is the issue I am not using assembly_summary as my backbone, I am trying to build it with all available sequences plasmid contigs scaffold in all around 42080 species for bacteria and 5654 for viral.

@sridhar0605
Copy link
Author

any solution for this?

@mourisl
Copy link
Collaborator

mourisl commented Aug 11, 2016

Can you show us the line for NZ_AJTB01000101.1 in the seqid2taxa.map file and lines around it? Is the corresponding tax id (1527292) in the nodes.dmp and names.dmp?

@sridhar0605
Copy link
Author

sridhar0605 commented Aug 11, 2016

Since I did not follow your manual online I made my own script and built the seqid2taxa.map (where is used all accession id from fasta header and got tax id from ncbi), and yes @fbreitwieser was right it has been removed from the database. and hence not seen in nodes.dmp. So the next question to ask is how is it still on their refseq website in fasta file. and how do i cater this issue to build centrifuge index?

@fbreitwieser
Copy link
Collaborator

The thing is that RefSeq and the taxonomy database are not always at the same state. In Centrifuge the sequences with no mapping get added to the database with taxonomy ID 0 - though maybe we should just skip them. But the database should be built without problems, even if there is missing mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants