Discrepancies in L1 norm between opal and CAMI #23

dkoslicki · 2017-12-07T07:46:04Z

Metrics in CAMI were computed with this code. See lines 154-175 for the computation of L1 norm.

fernandomeyer · 2017-12-08T17:15:06Z

The results for L1 norm don't match because the indicated code normalizes the abundances by default. For each rank, it sums up all abundances and then divides the abundance of each taxon by that sum.

For example, the CAMI gold standard lc contains two taxa for superkingdom:

10239 superkingdom 10239 Viruses 6.3464
2 superkingdom 2 Bacteria 28.5714

The considered abundances will be:

10239: 0.1817525732
2: 0.8182474268

Do we want this normalization in OPAL? It also affects many other metrics.

fernandomeyer · 2017-12-08T17:25:33Z

Another difference: the indicated code looks for multiple predictions for the same taxon in the same profile, summimg up the repeated predictions. OPAL only considers one prediction per taxon, which seems logical.

alicemchardy · 2017-12-10T20:26:36Z

Principally, this will make a difference if things are left unassigned at a rank, which can be the case. I do not understand in this example why the gold standard does not sum up to 100 percent, though, at domain level - is this because of the circular elements? In that example, it would correspond to the filtering then? David ? We might want to make it an option, to do this or not do this. Best, Alice Am 08.12.2017 um 18:25 schrieb fernandomeyer <notifications@github.com<mailto:notifications@github.com>>: Another difference: the indicated code looks for multiple predictions for the same taxon in the same profile, summimg up the repeated predictions. OPAL only considers one prediction per taxon, which seems logical. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#23 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACAH4Slr5kqshs9M5OdLKkMTncsayyeEks5s-XEOgaJpZM4Q5IIV>.

…

________________________________ Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 Braunschweig | www.helmholtz-hzi.de Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, Bundesministerium für Bildung und Forschung Stellvertreter: MinDirig Rüdiger Eichel, Niedersächsisches Ministerium für Wissenschaft und Kultur Geschäftsführung: Prof. Dr. Dirk Heinz; Silke Tannapfel Gesellschaft mit beschränkter Haftung (GmbH) Sitz der Gesellschaft: Braunschweig Handelsregister: Amtsgericht Braunschweig, HRB 477

dkoslicki · 2017-12-19T02:43:07Z

@fernandomeyer the issue with summing up multiple predictions was my attempt at error handling. Using just one (or the first) of multiple predictions also makes sense (but is somewhat arbitrary). In general, just taking the first prediction might lead to unexpected results, but it's sort of the user's fault for a malformed *.profile file. So whichever direction you choose to go is fine with me.

dkoslicki · 2017-12-19T02:44:01Z

With respect to the normalization:
The rational for normalization was that it standardizes (somewhat) the metric values. Without normalizing, the metric is "biased" towards samples that make fewer predictions. For example, if a tool only makes a prediction for 1% of the sample, the metric will be at worst 1.01, whereas a tool that correctly predicts 50% of the abundances exactly correctly, then it's L1 norm will be 1. Normalizing would change this to be 1.99 in the former case (close to the maximal value of 2), and still 1 in the latter.

So in summary, I do think we should allow this as an option (which is the way I coded it originally, if I recall correctly).

fernandomeyer · 2018-01-05T10:51:25Z

With normalization (now default option in OPAL), OPAL matches the L1 norm of the results in https://github.com/CAMI-challenge/firstchallenge_evaluation/tree/master/profiling/data/submissions_evaluation/56bb3485727d7a24678adf67
However, unifrac values don't match anymore. For the results above, one can conclude that normalized abundances were used to compute L1 norm but not unifrac. Is this the desired behavior?

fernandomeyer · 2018-01-25T09:11:30Z

Already implemented:
-Abundances will be normalized by default for all metrics, as discussed.
-Multiple predictions for the same taxon will be summed up.

fernandomeyer self-assigned this Dec 18, 2017

fernandomeyer closed this as completed Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies in L1 norm between opal and CAMI #23

Discrepancies in L1 norm between opal and CAMI #23

dkoslicki commented Dec 7, 2017

fernandomeyer commented Dec 8, 2017

fernandomeyer commented Dec 8, 2017

alicemchardy commented Dec 10, 2017 via email

dkoslicki commented Dec 19, 2017

dkoslicki commented Dec 19, 2017

fernandomeyer commented Jan 5, 2018

fernandomeyer commented Jan 25, 2018

Discrepancies in L1 norm between opal and CAMI #23

Discrepancies in L1 norm between opal and CAMI #23

Comments

dkoslicki commented Dec 7, 2017

fernandomeyer commented Dec 8, 2017

fernandomeyer commented Dec 8, 2017

alicemchardy commented Dec 10, 2017 via email

dkoslicki commented Dec 19, 2017

dkoslicki commented Dec 19, 2017

fernandomeyer commented Jan 5, 2018

fernandomeyer commented Jan 25, 2018