Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is my abundace estimation value zero for all classifications? How can this be turned on? #158

Open
harisankar991 opened this issue Nov 21, 2018 · 32 comments

Comments

@harisankar991
Copy link

No description provided.

@harisankar991 harisankar991 changed the title Why is my abundace estimation value zero for all classifications? Why is my abundace estimation value zero for all classifications? How can this be turned on? Nov 21, 2018
@chilltrout
Copy link

Also very intrigued by this, went over the docs and I dont understand how to turn it on.
Im assuming abundance is always on because I have gotten numbers populated with a very high coverage genome on pacbio data but never on my nanopore data

@harisankarsadasivan
Copy link

Yes, makes sense. I faced the same with nanopore data, minion v9.4.1.

@shashibioinfo
Copy link

Hi sir,
even i had the same issue
i have analyzed by minION nanopore data using centrifuge tool the output files shows abundance as zero.

how to resolve this ?
please help me any valuable suggestions will be appreciated

Thank you

@shashibioinfo
Copy link

Yes, makes sense. I faced the same with nanopore data, minion v9.4.1.

even i have same issue
if you have solved this can you please help me to resolve the issue
Thank you

@ExplodingCabbage
Copy link

I've seen the same thing. One species has >80% of all reads assigned to it, according to the output TSV, yet its abundance is still listed as 0.0, just like every other row.

@guokai8
Copy link

guokai8 commented Oct 24, 2019

I have the same issue. I don't known why.

@mourisl
Copy link
Collaborator

mourisl commented Oct 24, 2019

Can you check whether there these are unique assignment or not? Thanks.

@guokai8
Copy link

guokai8 commented Oct 25, 2019

Yes. I am sure there are unique reads here.

@Aiswarya-prasad
Copy link

Has anyone been able to resolve this? I am having the same issue with nanopore reads.

@mourisl
Copy link
Collaborator

mourisl commented Jan 7, 2020

I'm checking on this issue. The abundance estimation is on by default. Does any of the read's assignment to the subspecies(leaf) level? Can you show me a few lines of the report file? Thanks.

@jmaricb
Copy link

jmaricb commented Apr 7, 2020

Hi, are there any updates with this issue? I seem to be getting zero abundances for every species. Here are the commands I have been using:

centrifuge \ -x data/classifiers-DB/centrifuge/p_compressed+h+v \ -p 8 \ -f data/reads-fastq/ONT/communities-synthetic/integration_dataset.fasta \ -S out \ --report-file report

After that I also used this command to get kraken style report: centrifuge-kreport \ -x data/classifiers-DB/centrifuge/p_compressed+h+v \ out > kraken_report

You can download the output here: https://www.dropbox.com/s/a5j415ixyts9lox/Archive.zip?dl=0
You can see that there are species in the kraken_report file that have high abundance, and also in the report file you can see that there are species with high number of unique reads, but the abundance is still zero for all the rows.

@mourisl
Copy link
Collaborator

mourisl commented Apr 7, 2020

Thanks for sharing the files. I'll look into this.

@mourisl
Copy link
Collaborator

mourisl commented Apr 7, 2020

You are using the p_compressed+h+v index, however the seqId column from the output is not in the form of cid|XXX from the compression. I guess the index you are using is actually p+h+v. Could you please check whether the index is correct?

@jmaricb
Copy link

jmaricb commented Apr 7, 2020

Hi, thank you for the response. Sorry, I have sent you the data I have classified with custom database that I built from Bacteria and Archaea genomes. The commands I used to build the database are:

centrifuge-download -o taxonomy taxonomy
centrifuge-download -o library -m -d "archaea,bacteria" refseq > seqid2taxid.map
cat library/*/*.fna > input-sequences.fna
centrifuge-build -p 10 --conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp input-sequences.fna abv

I think that succeeded, because as you can see the kraken report gives reasonable classification.

I am now sending new data: https://www.dropbox.com/s/cefkjfz0a4kq1ig/Data.zip?dl=0

  • There is the folder 'custom' which contains the output, report and kraken-report for the dataset classified with that custom index.
  • There is the folder 'default' which contains the output, report and kraken-report for the same dataset clasdified with the p_compressed+h+v index
  • And finally there is the folder integration that contains small dataset which when classified to p_compressed+h+v index gives abundances that are not zero. I don't know why. The output and the report are inside the directory.

Here is the dataset I've been using. It's quite large so I am sending it separately:
https://www.dropbox.com/s/jeuaho0slc45p9w/silico.fastq.zip?dl=0

@jmaricb
Copy link

jmaricb commented Apr 8, 2020

@mourisl Hi, one more thing. I don't know if it can help. But my integration.fastq dataset also works with custom index that I created, so the problem might not be in the indexes.

Here are the results of the classification with the custom index: https://www.dropbox.com/s/iojc2br7q17ru1m/integration_custom.zip?dl=0

@jmaricb
Copy link

jmaricb commented Apr 9, 2020

@mourisl
Hi,
can you just help me to calculate the abundances by myself. I would like to do that, but in the centrifuge output reads are classified to multiple species. How can I determine to which species each read should classify? Is there a way for centrifuge to determine one species to which certain read should classify to?

Thank You.

@mourisl
Copy link
Collaborator

mourisl commented Apr 9, 2020

@jmaricb You can directly use the abundance from kreport. For the multiple-assigned reads, the count will be added to their lowest ancestor in the taxonomy tree. You can also use "--no-lca" in kreport, which add the count to a strain in the fraction of the number of assignment.

@jmaricb
Copy link

jmaricb commented Apr 9, 2020

@mourisl
Sorry for bothering you, but just one more question.
If I have a read that is mapped to three tax ids, like this:
SRR5891470.22869 species 106654 676 676 41 2302 3
SRR5891470.22869 species 470 676 676 41 2302 3
SRR5891470.22869 NZ_CP033858.1 2420300 676 676 41 2302 3

In the report (let's say kreport), this read will be assigned to lowest ancestor of these three tax ids (106654, 470, 2420300), which is Acinetobacter (tax id = 469)? Am I right?

Does this mean that only reads that map to single species will be assigned to that species?

Thank You.

@mourisl
Copy link
Collaborator

mourisl commented Apr 9, 2020

@jmaricb Yes, that is the default behavior of kreport. You can use "--no-lca" in centrifuge-kreport to put fraction of a read to the species. Note that, Centrifuge already assigns a read to its lowest common ancestor if it assigned to too many species (-k option).

@jmaricb
Copy link

jmaricb commented Apr 9, 2020

@mourisl
Thank you very much. I think I got everything I need to calculate the abundances.

May I just know one last thing.
"--no-lca Do not report the LCA of multiple assignments, but report count fractions at the taxa."

How do you calculate count fractions for each species from multiple assignments when you use --no-lca?

@mourisl
Copy link
Collaborator

mourisl commented Apr 9, 2020

@jmaricb If a read is assigned to 4 species, the the four species' abundance will add 0.25.

@jmaricb
Copy link

jmaricb commented Apr 10, 2020

Thank You for you help.

@Adoni5
Copy link

Adoni5 commented May 21, 2020

@mourisl

I am also using a compressed index (p_compressed hosted on the site) with nanopore reads, and am getting an abundance of 0. I am building a custom index of bacteria from refseq to test if the compressed indexes are the problem, but was wondering of there is anything else you would recommend trying?

Sample ouput -

readID	seqID	taxID	score	2ndBestScore	hitLength	queryLength	numMatches
2bef9c72-eeab-4b54-b7a0-4f4696866878	NC_018695.1	1229205	225	225	30	215	2
a6d6c54d-b1e2-45ee-858f-0cb61d0fc2f5	NZ_CP016077.1	1612551	121	121	26	439	2

Sample report -

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Myxococcales	29	order	9697933	1	0	0.0
Cystobacter fuscus	43	species	12349744	1	1	0.0

@xiechangxiao
Copy link

I have the same issue. The abundance value always get 0 when I use the latest verion centrifuge and h+p+v+c database analysis nanopore data. Could you help me correct it, thank you.
Here is my code.
centrifuge -x database/centrifuge_databases/hpvc/hpvc -U BC_25.fq.gz --report-file BC_25.report -S BC_25.output

@tanushrin
Copy link

I am having issue with the abundance estimation; getting 0 abundances for most of the species except one species (with abundance value: 1). In the centrifuge_report.txt, there are species with high abundance however, centrifuge_report.tsv shows abundance as 0.
I created a custom database : archaea, bacteria, protozoa, fungi, plant, algae

Here are the centrifuge commands I have been using:

centrifuge-build -p 24 --conversion-table $REF_SEQ_DIR/accession2taxid_cent.map --taxonomy-tree $REF_SEQ_DIR/nodes.dmp --name-table $REF_SEQ_DIR/names.dmp $DB.fa $DB > $DB.log

centrifuge -p 24 -x $DB -q in.fq > out.txt

centrifuge-kreport -x $DB out.txt > centrifuge_report.txt

How to get proper(non-zero) abundance values? Would appreciate any help.

Thank you!

@Kumereng
Copy link

Hi i have exactly that same issue which has not been resolved. The abundance is also zero.

@lixiaopi1985
Copy link

same issue with the latest Centrifuge.

@mourisl
Copy link
Collaborator

mourisl commented Aug 16, 2021

I just fixed an issue with estimating average genome sizes, which was also related to the abundance estimation procedure. Could you please try the new version and check whether the abundance values become normal? You don't need to rebuild the index.

@BaylorLyu
Copy link

I just fixed an issue with estimating average genome sizes, which was also related to the abundance estimation procedure. Could you please try the new version and check whether the abundance values become normal? You don't need to rebuild the index.

The problem still have in current version,only few cloumn have abundace value

@sybrohee
Copy link

sybrohee commented Mar 9, 2023

Unfortunately, still having the same issue. All abundances stay equal to 0.0 and no iteration was performed.

@mourisl
Copy link
Collaborator

mourisl commented Mar 9, 2023

I can reproduce the zero abundance issue on one of the data sets. I'm working on it now, and it seems more complex than I thought.

@sybrohee
Copy link

@mourisl Thank you for considering the issue (and all your nice work with centrifuge)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests