You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The biolink model contains a tree of predicates under the related_to slot. Each can have mappings to different curie-based vocabulary terms. So for example, there might be a chunk that looks like:
Currently, edge normalization takes a curie. The first thing it does is that it looks for an exact mapping. So if the input curie is m2, it will return predicate 2. If the input curie is m1 it will return predicate m1. So the first thing it does is attempt to match the granularity of the predicate as closely as it can.
If there is not a match however, the norm will attempt to find the most granular match that it can. It can currently only do this for RO terms. So let's say that in RO there is a grouping of terms like
RO1
- RO2
- RO3
Let's say that you're passing RO3 into edge norm and it is not an exact match. Then edgenorm will go up the RO hierarchy and see if RO2 is a match to something in a biolink mapping. If it is not, then it will go to RO1 and see if that is an exact match. If it is a match then it returns that term. Sometimes the matching RO may be mapped to a leaf node (say RO2 == m3 in which predicate 3 is returned) and sometimes not (maybe RO2 maps to m1 and predicate 1 is returned).
So the current approach is to find the most granular term that can be reasonably mapped to the input curie. If the user wants to create a less granular term then we force them to interrogate the biolink model themselves.
But there's no reason we can't do that as well. So edgenorm could take a set of predicates as input that represent that user's preferred level of aggregation for whatever task they are engaged in. So maybe they pass in [..., predicate 2, ...].
In that case, if their input curie gets mapped to either predicate 2 or predicate 3, predicate 2 will be returned. If it maps to predicate 1, it cannot necessarily be pushed down to 2, so predicate 1 will be returned.
In other words, this approach can only make the output less granular, not more granular than the current approach. The only way to do that is add more granular predicates to the model, either individually or in bulk (i.e. all RO terms are now in the model)
One add-on to this would be some prebuilt profiles ['pharmacology', 'genetics','robokop', 'rtx','classic'] or whatever that would be some conception of a particular set of tradeoffs. (I want the most granular parts of chemistry predicates, but I don't care about a bunch of fine grained anatomical predicates).
In other words, this approach can only make the output less granular, not more granular than the current approach. The only way to do that is add more granular predicates to the model, either individually or in bulk (i.e. all RO terms are now in the model)
So that is true if BioLink Model specifies both a vocabulary and a specific "allowed subset" of that vocabulary (the R01, 'R02, and R03` in your example above). But what if we only asked BioLink Model to specify allowed vocabularies? Then wouldn't that allow users the full flexibility to specify their granularity of interest as a parameter to edgenorm, without being constrained by any centralized decisions? (I suppose this is functionally equivalent to @cbizon's suggestion of "add items in bulk, (i.e. all RO terms are now in the model)".)
(Note also that I realize BioLink Model has applications outside of Translator, so if it's impractical to make such a fundamental change like this to BioLink Model, perhaps that argues for something similar-but-different...)
The biolink model contains a tree of predicates under the related_to slot. Each can have mappings to different curie-based vocabulary terms. So for example, there might be a chunk that looks like:
Currently, edge normalization takes a curie. The first thing it does is that it looks for an exact mapping. So if the input curie is m2, it will return predicate 2. If the input curie is m1 it will return predicate m1. So the first thing it does is attempt to match the granularity of the predicate as closely as it can.
If there is not a match however, the norm will attempt to find the most granular match that it can. It can currently only do this for RO terms. So let's say that in RO there is a grouping of terms like
Let's say that you're passing RO3 into edge norm and it is not an exact match. Then edgenorm will go up the RO hierarchy and see if RO2 is a match to something in a biolink mapping. If it is not, then it will go to RO1 and see if that is an exact match. If it is a match then it returns that term. Sometimes the matching RO may be mapped to a leaf node (say RO2 == m3 in which predicate 3 is returned) and sometimes not (maybe RO2 maps to m1 and predicate 1 is returned).
So the current approach is to find the most granular term that can be reasonably mapped to the input curie. If the user wants to create a less granular term then we force them to interrogate the biolink model themselves.
But there's no reason we can't do that as well. So edgenorm could take a set of predicates as input that represent that user's preferred level of aggregation for whatever task they are engaged in. So maybe they pass in [..., predicate 2, ...].
In that case, if their input curie gets mapped to either predicate 2 or predicate 3, predicate 2 will be returned. If it maps to predicate 1, it cannot necessarily be pushed down to 2, so predicate 1 will be returned.
In other words, this approach can only make the output less granular, not more granular than the current approach. The only way to do that is add more granular predicates to the model, either individually or in bulk (i.e. all RO terms are now in the model)
One add-on to this would be some prebuilt profiles ['pharmacology', 'genetics','robokop', 'rtx','classic'] or whatever that would be some conception of a particular set of tradeoffs. (I want the most granular parts of chemistry predicates, but I don't care about a bunch of fine grained anatomical predicates).
Thoughts @cmungall @cbizon @RichardBruskiewich @mbrush @andrewsu @saramsey @TriageDr @deepakunni3 ?
The text was updated successfully, but these errors were encountered: