-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancies in L1 norm between opal and CAMI #23
Comments
The results for L1 norm don't match because the indicated code normalizes the abundances by default. For each rank, it sums up all abundances and then divides the abundance of each taxon by that sum. For example, the CAMI gold standard lc contains two taxa for superkingdom: 10239 superkingdom 10239 Viruses 6.3464 The considered abundances will be: 10239: 0.1817525732 Do we want this normalization in OPAL? It also affects many other metrics. |
Another difference: the indicated code looks for multiple predictions for the same taxon in the same profile, summimg up the repeated predictions. OPAL only considers one prediction per taxon, which seems logical. |
Principally, this will make a difference if things are left unassigned at a rank, which can be the case.
I do not understand in this example why the gold standard does not sum up to 100 percent, though, at domain level - is this because of the circular elements? In that example, it would correspond to the filtering then? David ?
We might want to make it an option, to do this or not do this.
Best,
Alice
Am 08.12.2017 um 18:25 schrieb fernandomeyer <notifications@github.com<mailto:notifications@github.com>>:
Another difference: the indicated code looks for multiple predictions for the same taxon in the same profile, summimg up the repeated predictions. OPAL only considers one prediction per taxon, which seems logical.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#23 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACAH4Slr5kqshs9M5OdLKkMTncsayyeEks5s-XEOgaJpZM4Q5IIV>.
…________________________________
Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 Braunschweig | www.helmholtz-hzi.de
Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, Bundesministerium für Bildung und Forschung
Stellvertreter: MinDirig Rüdiger Eichel, Niedersächsisches Ministerium für Wissenschaft und Kultur
Geschäftsführung: Prof. Dr. Dirk Heinz; Silke Tannapfel
Gesellschaft mit beschränkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477
|
@fernandomeyer the issue with summing up multiple predictions was my attempt at error handling. Using just one (or the first) of multiple predictions also makes sense (but is somewhat arbitrary). In general, just taking the first prediction might lead to unexpected results, but it's sort of the user's fault for a malformed *.profile file. So whichever direction you choose to go is fine with me. |
With respect to the normalization: So in summary, I do think we should allow this as an option (which is the way I coded it originally, if I recall correctly). |
With normalization (now default option in OPAL), OPAL matches the L1 norm of the results in https://github.com/CAMI-challenge/firstchallenge_evaluation/tree/master/profiling/data/submissions_evaluation/56bb3485727d7a24678adf67 |
Already implemented: |
Metrics in CAMI were computed with this code. See lines 154-175 for the computation of L1 norm.
The text was updated successfully, but these errors were encountered: