Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--report-minimizer-data; distinct minimizer exceeds inspect minimizer #445

Open
NienkeMekkes opened this issue May 4, 2021 · 5 comments

Comments

@NienkeMekkes
Copy link

Dear authors,

The new --report minimizer-data is a very promising feature! I do have a question about it. When I run kraken2-inspect on my database, I find one column which is: "amount of database minimizers that map to a taxon rooted in this clade". When I run kraken2 with --report-minimizer-data, I find that the estimate in the distinct minimizer column can be higher than this inspect value. I expected that the inspect value would be the maximum number of distinct minimizers that you can find at that clade. Why is this not the case?

Thanks

For example; in my database ~300.000 minimizers are rooted at S bacteroides fragilis. In my kraken2 output, I found 1.370.00 distinct minimizers for S bacteroides fragilis.

@kdbchau
Copy link

kdbchau commented May 4, 2021

What is the command you are using?

@NienkeMekkes
Copy link
Author

For running Kraken2, I typically use: kraken2 reads/ --db krakendb --paired --output sampleID_kraken_output.txt --report sampleID_kraken_report.txt --report-minimizer-data. The mentioned row for bacteroides fragilis looks like:

20.62 458469 442261 21635904 1372051 S 817 Bacteroides fragilis

For kraken2-inspect, I use: kraken2-inspect --db krakendb. The mentioned row for bacteroides fragilis

0.03 302714 290736 S 817 Bacteroides fragilis

@mihkelvaher
Copy link

Seems like a duplicate of #392

@phspo
Copy link

phspo commented Jan 5, 2023

can confirm this, reading the source code is a bit confusing since the option is referred to as "report kmer data" vs minimizer, maybe the number is indeed the number of assigned k-mers? or does it maybe also count distinct minimizers even if they don't belong to the taxon a read was assigned to?

as a suggestion it could also be helpful to output minimizers/unique minimizers at node level in addition to the subtree rooted at a specific node (this can be calculated from the subtree or bottom up for the entire tree obviously).

@phspo
Copy link

phspo commented Jan 5, 2023

looks like this would be any minimizer found in the read even if it's not matching the taxon that gets assigned as the final classification?

if (taxon) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants